Data Mining for Research Purposes
DOI:
https://doi.org/10.15407/sofs2025.02.027Keywords:
data mining, empirical regularity, training sample, feature space, data design.Abstract
The article considers the possibility of using Data Mining (DM) methods in a scientific study aimed at cognition, which requires an answer to the question ‘why?’, as well as the connection of DM methods with general questions of cognition. The research methodology is at the intersection of mathematics, statistics, computer science, and management theory. Along with the use of general scientific methods of cognition (generalization, systematization, abstraction, induction and deduction, analysis and synthesis, analogy, comparison, formalization, modelling, classification, categorization), special methods of analysis (logical, structural, functional), descriptive research method, and interpretive methodology are applied. The relationship of DM methods with the general theory of cognition is clarified. It is proved that DM methods are a new, human-machine methodology of empirical cognition. The authors point out the limitations of these methods: the results of their use are empirical regularities (ERs), i.e. probable (rather than exact) knowledge, on the basis of which instructions or decision-making rules are formulated, but remain at the first, lowest possible, empirical level of knowledge, which is typical for business. However, the main goal of scientific research is to obtain truly new knowledge that can be translated into innovations and provide a higher-order competitive advantage. ESMs can serve as ‘blanks’ for formulating, testing and selecting hypotheses in order to further deepen the knowledge of the subject area and obtain more reliable knowledge that brings the researcher closer to understanding. And if understanding is gained, it means that something has been found that underlies the knowledge gained, and this can become a new method of solving the tasks at hand. Knowledge is information, and understanding is the ability to draw conclusions. Today, this is the limit of applicability of all DM tools: they provide knowledge in the form of empirical and working hypotheses, but do not provide understanding, without which it is impossible to move to the next level of knowledge, the scientific method.
References
Zakrevsky, A.D. (1988). The logic of recognition. Moscow: Nauka i tekhnika [in Russian].
Gnedenko, B.V. (1983). Mathematics and scientific cognition. Moscow: Znaniye [in Russian].
Malinovsky, L. (1986). Classification processes: the basis for the construction of sciences about reality. Algorithms of experimental data processing. Moscow: Nauka [in Russian].
Jurs, P., & Eisenauer, T. (1977). Pattern recognition in chemistry. New Jersey: Wiley.
Krendelev, F.P., & Krendelev, S.F. (1977). Heuristic methods in geology. Moscow: Nauka [in Russian].
Alabin, B.K., Voronin, Y.A., Goldin, S.V., Goldina, N.A., Yeganov, E.А., Ivanova, М.N., & et al. (1970). Geology and mathematics. Novosibirsk: Nauka [in Russian].
Rastegari, H., & Sap, M.N. (2008). Data mining and e-commerce: methods, applications, and challenges. Jurnal Teknologi Maklumat, 20 (2), 116—128.
Gupta, A., Dengre, V., Kheruwala, H.A., & Shah, M. (2020). Comprehensive review of text-mining applications in finance. Financial Innovation, 6 (39), 1—25. https://doi.org/10.1186/s40854-020-00205-1
Daher, J.B., Brun, A., & Boyer, A.A. (2018). Review on Heterogeneous, Multi-source and Multi-dimensional data mining. LORIA — Université de Lorraine.
Waters, D.J. (2023). The emerging digital infrastructure for research in the humanities. International Journal on Digital Libraries, 24, 87—102. https://doi.org/10.1007/s00799-022-00332-3
Németh, R., & Koltai, J. (2021). The Potential of Automated Text Analytics in Social Knowledge Building. Pathways Between Social Science and Computational Social Science. Springer, 49—70. https://doi.org/10.1007/978-3-030-54936-7_3
Leitgöb, H., Prandner, D., & Wolbring, T. (2023). Big data and machine learning in sociology. Front. Sociol., 8. https://doi.org/10.3389/fsoc.2023.1173155
Strachan, S., Stephen, B., & McArthur, S. (2007). Practical Applications of Data Mining in Plant Monitoring and Diagnostics. IEEE Power Engineering Society General Meeting, 1—7. https://doi.org/10.1109/PES.2007.385983
Toledo M.R., Vázquez E.R., García-Salcedo R., Gómez-Vargas I., & Uruchurtu E.S., et al. (2021). Data Mining applied to interventional cardiology procedures. Journal of Physics: Conference Series, 1723 (1), 12—34, 012034. https://doi.org/10.1088/1742-6596/1723/1/012034
Khajehei, M., & Etemady, F. (2010). Data Mining and Medical Research Studies. Second International Conference on Computational Intelligence, Modelling and Simulation. Bali, Indonesia, 119—122. https://doi.org/10.1109/CIMSiM.2010.24
Algarni, A. (2016). Data Mining in Education. International Journal of Advanced Computer Science and Applications, 7 (6). https://doi.org/10.14569/ijacsa.2016.070659
Yethiraj, N.G. (2012). Applying Data Mining Techniques in the Field of Agriculture and Allied Sciences. International Journal of Business Intelligence Research, 1 (2), 40—42. https://doi.org/10.20894/ijbi.105.001.002.004
Polyakov, M., Khanin, I., Shevchenko, G., & Bilozubenko, V. (2021). Data mining as a cognitive tool: capabilities and limits. Knowledge and Performance Management, 5 (1), 1—13. https://doi.org/10.21511/kpm.05(1).2021.01
Marx, K. (1955). Revenue and its Sources. Vulgar Political Economy. Moscow: Publishing House of Political Literature [in Russian].
Khanin, I.G. (2018). Issues of noospheric development of economy and cognition. Dnipro: Nova ideolohia [in Ukrainian].
Smirnov, V.A. (1964). Problems of the logic of scientific cognition. Moscow: Institute of Philosophy, Academy of Sciences of the USSR [in Russian].
Chorayan, O.G. (1987). The concept of probability and fuzziness in the work of the brain. Rostov-on-Don: Publishing house of Rostov State University [in Russian].
Shapiro, D.I. (1977). An introduction to human-machine methods for solving one class of problems. Issues of Cybernetics. Theory and Practice of Situational Management, 18, 82—88 [in Russian].
Shevchenko, G.Ya., Bilozubenko, V.S., & Marchenko, О.А. (2022). The formation of the corporate scientific environment. Science and Science of Science, 2 (116), 12—24. https://doi.org/10.15407/sofs2022.02.012
Oleshko, D.N., Krisilov, V.A., & Blazhko, A.A. (2004). Constructing a qualitative training sample for predictive neural network models. Artificial Intelligence, 3, 567— 573 [in Russian].
Vasilenko, Yu.A., & Shevchenko, G.Ya. (1979). An analytical method of finding tests. Automatics, 4, 3—8 [in Russian].
Tukey, J.W. (1977). Exploratory Data Analysis. London: Addison-Wesley.
Tukey, J.W. (1962). The Future of Data Analysis. Annals of Mathematical Statistics, 33 (1), 1—67. https://doi.org/10.1214/aoms/1177704711
Ishikawa, K. (1985). What is Total Quality Control? The Japanese Way. Englewood Cliff s, NJ: Prentice-Hall, Inc.
Taguchi, G., Chowdhury. S., & Wu, Y. (2005). Taguchi’s Quality Engineering Handbook. Hoboken, New Jersey: John Wiley & Sons. https://doi.org/10.1002/9780470258354
Fisher, R.A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7, 179—188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Akademperiodyka of the NAS of Ukraine

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.



