Automated Authorship Identification of Program Code Based on a Metric System

Authors

DOI:

https://doi.org/10.15407/intechsys.2025.02.081

Keywords:

metrics, attribution, source code, authorship identification

Abstract

 The paper reviews existing methods for automated program code authorship attribution and then proposes an original method based on a system of metrics. The proposed method uses a metric system grounded in the “fingerprinting” technique. The metrics reflect the individual stylistic features of a programmer, regardless of the programming language.

References

Frantzeskou G., Stamatatos E., Gritzalis S., Chaski C. Identifying Authorship by Byte-Level N-Grams: The Source Code Author Profile (SCAP) Method. International Journal of Digital Evidence, Trier, Germany, 2007, Vol. 6 ( 1), 139–148. URL: https://www.researchgate.net/publication/220542545_Identifying_Authorship_by_Byte-Level_N-Grams_The_Source_Code_Author_Profile_SCAP_Method [Accessed 12 Nov. 2024].

Frantzeskou G., MacDonell S., Stamatatos E., Georgiou S., Gritzalis S. The significance of user-defined identifiers in Java source code authorship identification Computer Systems Science and Engineering. Samos, Greece, 2011.

GrayA., Sallis P., MacDonell S. IDENTIFIED (Integrated Dictionary-based Extraction of Non-language-dependent Token Information for Forensic Identification, Examination, and Discrimination): a dictionary-based system for extracting source code metrics for software forensics. IEEE Computer Society Press, Dunedin, New Zealand, 1998, 252–259 pp. https://doi.org/10.1109/SEEP.1998.707658

Sallis P., Aakjaer A., MacDonell S. Software Forensics: Old Methods for a New Science. IEEE Computer Society Press, Dundin, New Zealand, 1998, 367–371 pp. https://doi.org/10.1109/SEEP.1996.534037

Ding H., Samadzadeh M.H. Extraction of Java program fingerprints for software authorship identification. The Journal of Systems and Software, 2004, Vol. 72 (1), 49–57. https://doi.org/10.1016/S0164-1212(03)00049-9

Abuhamad M., AbuHmed T., Mohaisen A., Nyang, D.H. Large-scale and language-oblivious code authorship identification. Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2018, 101-114. https://doi.org/10.1145/3243734.3243738

Daya G. et al. GraphCodeBERT: Pre-training Code Representations with Data Flow Proceedings of the ICLR 2021. URL: https://www.researchgate.net/publication/344294734_GraphCodeBERT_Pre-training_Code_Representations_with_Data_Flow [Accessed 14 Nov. 2024]

Abbasi A., Javed A.R., Iqbal F. et al. Authorship identification using ensemble learning. Scientific Reports, 2022, Issue 12. https://doi.org/10.1038/s41598-022-13690-4

Downloads

Published

2025-07-17

How to Cite

Adamchuk, A., Sushchuk-Slusarenko, V., & Dychka, A. (2025). Automated Authorship Identification of Program Code Based on a Metric System. Information Technologies and Systems, 2(2), 81–89. https://doi.org/10.15407/intechsys.2025.02.081

Issue

Section

Intellectual Information Technologies