Abstract—This paper presents an algorithm to automatically determine the number of clusters in a given input data set, under a mixture of Gaussians assumption. Our algorithm extends the Expectation- Maximization clustering approach by starting with a single cluster assumption for the data, and recursively splitting one of the clusters in order to find a tighter fit. An Information Criterion parameter is used to make a selection between the current and previous model after each split. We build this approach upon prior work done on both the K-Means and Expectation-Maximization algorithms. We extend our algorithm using a cluster splitting approach based on Principal Direction Divisive Partitioning, which improves accuracy and efficiency
Index Terms—clustering, expectation-maximization, mixture of Gaussians, principal direction divisive partitioning
Department of Computer Engineering, Delhi College of Engineering, India (firstname.lastname@example.org, email@example.com, firstname.lastname@example.org)
Cite: Ujjwal Das Gupta, Vinay Menon and Uday Babbar, "Parameter Selection for EM Clustering Using Information Criterion and PDDP," International Journal of Engineering and Technology vol. 2, no. 4, pp. 340-344, 2010.