REVIEW AND SELECTION OF CLUSTERING ALGORITHMS FOR DATASETS IN THE CONTEXT OF COUNTRIES' DECARBONIZATION

Olena ZHYTKEVYCH

doi:10.15407/economyukr.2025.11.043

Authors

Olena ZHYTKEVYCH Kyiv National Economic University named after Vadym Hetman https://orcid.org/0000-0003-2042-8795

DOI:

https://doi.org/10.15407/economyukr.2025.11.043

Keywords:

countries’ decarbonization potential; clustering algorithms; Self-Organizing Maps; clustering validation metrics

Abstract

The problem of assessing and modeling a country's decarbonization potential is crucial for ensuring sustainable economic and social development at the micro and macro levels. Given the increasing relevance of Big Data in decarbonization research and the integration of clustering algorithms, it is crucial to identify clustering methods that are scalable, robust and suitable for used dataset. This paper is structured as follows: the initial stage includes a comprehensive review and analysis of the relevant literature. The results of the literature review suggest that no single optimal clustering method, so a comparative approach adapted to the nature of the dataset may provide the best results. The next stage provides an overview of widely used clustering methodologies applied to the prepared datasets, particularly in the context of decarbonization. The author also provided an assessment of the quality of clustering using internal clustering metrics. The study has been performed on a pretrained dataset of 14 normalized key indicators to determine the decarbonization potential of 41 countries over a 10-year period. Finally, the application of the three clustering methods (K-means, GMM and SOM) was tested on a database to assess the decarbonization potential of different countries, including Ukraine and important conclusions were drawn. The study concludes that SOM with 3 and 5 clusters is most suitable clustering for a dataset used to determine the decarbonization potential of countries, in particulate Ukraine. The obtained clustering results can be used to adapt best international practices to the Ukrainian energy infrastructure, which is undergoing adverse significant transformations as a result of the war.

References

Kosse, I. (2023). Rebuilding Ukraine’s infrastructure after the war. Policy Notes and Reports 72. 24 p. URL: https://wiiw.ac.at/rebuilding-ukraine-s-infrastructure-after-the-war-dlp-6621.pdf

Zhou, W., Zhou, J., Hu, G. (2023). Research of varying patterns of CO₂ emissions in 182 countries based on K means method. Applied and Computational Engineering. No. 6(1). P. 1597–1606. https://doi.org/10.54254/2755-2721/6/20230480

Novo, R., Marocco, P., Giorgi, G., Lanzini, A., Santarelli, M., Mattiazzo, G. (2022). Planning the decarbonisation of energy systems: The importance of applying time series clustering to long-term models. Energy Conversion and Management: X. Vol. 15. 100274. https://doi.org/10.1016/j.ecmx.2022.100274

Inekwe, J., Valadkhani, A., Smyth, R. (2020). Drivers of carbon dioxide emissions: An empirical investigation using hierarchical and non hierarchical clustering methods. Environmental and Ecological Statistics. No. 20(4). P. 1–40. https://doi.org/10.1007/s10651-019-00433-4

Li, Z., Wang, C., Li, Y. (2020). Using clustering algorithms to characterise uncertain long term decarbonisation pathways. Applied Energy. Vol. 268. 114947. https://doi.org/10.1016/j.apenergy.2020.114947

Hu, Y., Weng, L. (2024). Net-zero energy transition in ASEAN countries: The evolutionary model brings novel perspectives to the cooperative mechanism of climate governance. Journal of Environmental Management. Vol. 351. 119999. https://doi.org/10.1016/j.jenvman.2023.119999

Chontanawat, J. (2018). Decomposition analysis of CO₂ emissions in ASEAN: An extended IPAT model. Energy Procedia. Vol. 153. P. 186–190. https://doi.org/10.1016/j.egypro.2018.10.057

Anser, M., Ali, S., Umair, M., Javid, R., Mirzaliev, S. (2024). Energy consumption, technological innovation, and economic growth in BRICS: A GMM panel VAR framework analysis. Energy Strategy Reviews. Vol. 56. 101587. https://doi.org/10.1016/j.esr.2024.101587

Jain, A. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters. No. 31. Iss. 8. P. 651–666. https://doi.org/10.1016/j.patrec.2009.09.011

Murtagh, F., Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. WIREs Data Mining and Knowledge Discovery. Vol. 2. Iss. 1. P. 86–97. https://doi.org/10.1002/widm.53

Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory. Vol. 28. Iss. 2. P. 129–137. https://doi.org/10.1109/TIT.1982.1056489

Ester, M., Kriegel, H., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. AAAI Press. URL: https://cdn.aaai.org/KDD/1996/KDD96-037.pdf

McLachlan, G., Peel, D. (2000). Finite mixture models. John Wiley & Sons. 427 p. https://doi.org/10.1002/0471721182

Kohonen, T. (2012). Self-organizing maps. Berlin, Springer. 502 p. https://doi.org/10.1007/978-3-642-56927-2

Datta, S., Datta, S. (2003). Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics. Vol. 19. Iss. 4. P. 459–466. https://doi.org/10.1093/bioinformatics/btg025

Rousseeuw, P. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics. Vol. 20. P. 53–65. https://doi.org/10.1016/0377-0427(87)90125-7

Manning, C., Raghavan, P., Schütze, H. (2023). Introduction to Information Retrieval. Cambridge, Cambridge University Press. 542 p. URL: https://nlp.stanford.edu/IR-book/

Davies, D., Bouldin, D. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 1. Iss. 2. P. 224–227. https://doi.org/10.1109/TPAMI.1979.4766909

Xu, R., Wunsch, D. (2008). Clustering. Wiley IEEE Press. 368 p. URL: https://www.wiley.com/en-us/Clustering-p-9780470276808

Caliński, T., Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics: Theory and Methods. Vol. 3. Iss. 1. P. 1–27. https://doi.org/10.1080/03610927408827101

Halkidi, M., Batistakis, Y., Vazirgiannis, M. (2001). On clustering validation techniques. Journal of Intelligent Information Systems. Vol. 17. P. 107–145. https://doi.org/10.1023/A:1012801612483

Hubert, L., Arabie, P. (1985). Comparing partitions. Journal of Classification. Vol. 2. P. 193–218. https://doi.org/10.1007/BF01908075

Zhou, S., Xu, H., Zheng, Z., Chen, J., Li, Z., Bu, J., Wu, J., Wang, X., Zhu, W., & Ester, M. (2024). A comprehensive survey on deep clustering: Taxonomy, challenges, and future directions. ACM Computing Surveys. Vol. 57. Iss. 3. P. 1–38. https://doi.org/10.1145/3689036

Strehl, A., Ghosh, J. (2002). Cluster ensembles — a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research. Vol. 3. P. 583–617. https://dl.acm.org/doi/10.1162/153244303321897735

REVIEW AND SELECTION OF CLUSTERING ALGORITHMS FOR DATASETS IN THE CONTEXT OF COUNTRIES' DECARBONIZATION

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Language

1

2