—In order to effectively retrieve required information from the large amount of information collected from the Internet, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document of collected information and require special handling for high dimensionality and high volume. We propose the OCFI (Ontology and Closed Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering. OCFI uses common words to cluster documents and builds hierarchical topic tree. In addition, OCFI utilizes ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results reveal that our method is more effective than the well-known document clustering algorithms. The clustering results can be used in the personalized search service to assist users to obtain the information they need.
—OCFI, documents clustering, ontology, closed frequent itemsets.
Cheng-Jhe Lee and Chiun-Chieh Hsu are with the Department of Information Management, National Taiwan University of Science and Technology, Taipei 106 Taiwan, R.O.C. (e-mail: email@example.com).
Da-Ren Chen is with the Department of Information Management, National Taichung University of Science and Technology, Taichung, Taiwan, R.O.C. (e-mail: firstname.lastname@example.org).
Cite: Cheng-Jhe Lee, Chiun-Chieh Hsu, and Da-Ren Chen, "A Hierarchical Document Clustering Approach with Frequent Itemsets," International Journal of Engineering and Technology vol. 9, no. 2, pp. 174-178, 2017.