Development of a Stability-Dispersion Adaptive Weighted K-Means Method for Feature-Sensitive Clustering

Ramli Rumeon; Surman Siloyanan

Authors

Ramli Rumeon Universitas Pattimura, Indonesia Author
Surman Siloyanan Author

Keywords:

Clustering, Feature weighting, Iris dataset, K-Means.

Abstract

This study develops Stability-Dispersion Adaptive Weighted K-Means (SDAW-K-Means), an extension of classical K-Means that updates feature weights according to within-cluster dispersion. Classical K-Means treats all standardized features equally, although some features may be more relevant for cluster separation than others. The proposed method estimates feature weights iteratively: features with smaller within-cluster dispersion receive larger weights, while less informative features receive smaller weights. The empirical illustration uses the public Iris dataset from the UCI Machine Learning Repository through scikit-learn. Results show that the proposed weighting mechanism is interpretable and can improve agreement with reference labels based on the adjusted Rand index. The article contributes a transparent feature-weighted K-Means formulation for applied clustering research.

References

[1] J. MacQueen, "Some methods for classification and analysis of multivariate observations," in

Proc. 5th Berkeley Symp. Mathematical Statistics and Probability, 1967, pp. 281-297.

[2] S. P. Lloyd, "Least squares quantization in PCM," IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129-137, 1982, doi: 10.1109/TIT.1982.1056489.

[3] J. A. Hartigan and M. A. Wong, "Algorithm AS 136: A K-means clustering algorithm," Journal of the Royal Statistical Society: Series C, vol. 28, no. 1, pp. 100-108, 1979.

[4] D. Arthur and S. Vassilvitskii, "k-means++: The advantages of careful seeding," in Proc. 18th

ACM-SIAM Symposium on Discrete Algorithms, 2007, pp. 1027-1035.

[5] A. K. Jain, "Data clustering: 50 years beyond K-means," Pattern Recognition Letters, vol. 31, no. 8, pp. 651-666, 2010, doi: 10.1016/j.patrec.2009.09.011.

[6] R. Xu and D. Wunsch, "Survey of clustering algorithms," IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645-678, 2005, doi: 10.1109/TNN.2005.845141.

[7] R. A. Fisher, "The use of multiple measurements in taxonomic problems," Annals of Eugenics, vol. 7, no. 2, pp. 179-188, 1936.

[8] UCI Machine Learning Repository, Iris Dataset. Irvine, CA, USA: University of California, Irvine.

[9] P. J. Rousseeuw, "Silhouettes: A graphical aid to the interpretation and validation of cluster analysis," Journal of Computational and Applied Mathematics, vol. 20, pp. 53-65, 1987, doi: 10.1016/0377-0427(87)90125-7.

[10] L. Hubert and P. Arabie, "Comparing partitions," Journal of Classification, vol. 2, no. 1, pp. 193- 218, 1985.

[11] R. Tibshirani, G. Walther, and T. Hastie, "Estimating the number of clusters in a data set via the

gap statistic," Journal of the Royal Statistical Society: Series B, vol. 63, no. 2, pp. 411-423, 2001.

[12] M. E. Celebi, H. A. Kingravi, and P. A. Vela, "A comparative study of efficient initialization methods for the K- means clustering algorithm," Expert Systems with Applications, vol. 40, no. 1, pp. 200-210, 2013, doi: 10.1016/j.eswa.2012.07.021.

[13] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, NJ, USA: Prentice Hall, 1988.

[14] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken, NJ, USA: Wiley, 1990.

[15] D. L. Davies and D. W. Bouldin, "A cluster separation measure," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, pp. 224-227, 1979, doi: 10.1109/TPAMI.1979.4766909.

[16] J. C. Dunn, "Well-separated clusters and optimal fuzzy partitions," Journal of Cybernetics, vol. 4, no. 1, pp. 95-104, 1974.

[17] F. Pedregosa et al., "Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.

[18] D. M. Witten and R. Tibshirani, "A framework for feature selection in clustering," Journal of the American Statistical Association, vol. 105, no. 490, pp. 713-726, 2010, doi: 10.1198/jasa.2010.tm09415.

[19] G. Gan, C. Ma, and J. Wu, Data Clustering: Theory, Algorithms, and Applications. Philadelphia, PA, USA: SIAM, 2007.

[20] B. Mirkin, Clustering for Data Mining: A Data Recovery Approach, 2nd ed. Boca Raton, FL, USA: CRC Press, 2012.