Document clustering is the process of organizing a particular electronic corpus of documents into subgroups with similar text characteristics. Previously, numerous statistical algorithms have been applied to cluster data, including text documents. There are recent attempts to improve clustering performance with optimization-based algorithms such as evolutionary algorithms. Therefore, document clustering with evolutionary algorithms has become an emerging topic that has gained more attention in recent years. This article presents an updated review entirely dedicated to evolutionary algorithms designed for document clustering. It first provides a comprehensive inspection of the document clustering model revealing its various components and related concepts. It then shows and analyzes the main research works on this topic. Finally, it brings together and classifies various objective functions from the collection of research articles. The article concludes by addressing some important questions and challenges that may be the subject of future work. The objective function (or fitness function) is the measure that evaluates the optimality of the evolutionary algorithmic solutions generated in the search space. In the clustering domain, the fitness function refers to the adequacy of the partitioning. Consequently, it must be formulated carefully, taking into account the fact that clustering is an unsupervised process. Different objective functions generate different solutions even forming the same evolutionary algorithm. Also assuming that fitness can be a minimization or maximization function. Furthermore, the algorithm could be formulated with one or more objective functions. To summarize, "choosing optimizati...... middle of paper ......traction. 1999.76. Turney, P.D., Learning algorithm for keyphrase Extraction. Information Retrieval, 2000. 2(4): p. 303- 336.77 .Wu, J.-l., and A. M. Agogino, Automating Key Phrase Extraction with Multi-Objective Genetic Algorithms. Proceedings of the Hawaii International Conference on Systems Science, HICSS 2003, 2003.78 combining terms using genetic algorithm. International Journal of Computer and Electrical Engineering, 2010. 2(1): p. 1-6.79 Dorfer, et al., On the performance of keywords Evolutionary algorithms in biomedical keyword clustering, in Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computing2011, ACM: Dublin, Ireland p. 511-518.
tags