Semantics based clustering through cover-kmeans with ontovsm for information retrieval
Document Type
Article
Publication Date
1-1-2020
Abstract
Document clustering plays a significant task in the retrieval of the information, which seeks to divide documents into groups automatically, depending on their content similarity. The cluster consists of related documents within the group (having high intra-cluster similarity) and dissimilar to other group documents (having low inter-cluster similarity). Clustering documents should be considered an unsupervised process that aims to classify documents by identifying underlying structures, i.e. the learning process is unsupervised. Thus there is no need to determine the correct output for an input. Previous clustering methods do not know the semantic associations between words such that the context of documents cannot be correctly interpreted. To address this problem, the advent of semantic ontology information such as WordNet was widely used to enhance text clustering consistency. This paper initially proposes an OntoVSM model to reduce the dimension of the document efficiently. The cover K-means clustering algorithm is proposed for semantic document clustering. The proposed algorithm is a hybrid version of K-means and covers coefficient-based clustering methodology (C3M) that is improved semantically using WordNet ontology. The dimensionality reduction based on semantic knowledge of each term preserves the information without loss. The performance of the proposed work is analyzed through experimental results. This shows that the proposed work gives improved results compared to other standard methods.
Publication Title
Information Technology and Control
First Page Number
370
Last Page Number
380
DOI
10.5755/j01.itc.49.3.25988
Recommended Citation
Lakshmana Kumar, R.; Kannammal, N.; Krishnamoorthy, Sujatha; Kadry, Seifedine; and Nam, Yunyoung, "Semantics based clustering through cover-kmeans with ontovsm for information retrieval" (2020). Kean Publications. 1275.
https://digitalcommons.kean.edu/keanpublications/1275