Automated Authorship Identification of Program Code Based on a Metric System
DOI:
https://doi.org/10.15407/intechsys.2025.02.081Keywords:
metrics, attribution, source code, authorship identificationAbstract
The paper reviews existing methods for automated program code authorship attribution and then proposes an original method based on a system of metrics. The proposed method uses a metric system grounded in the “fingerprinting” technique. The metrics reflect the individual stylistic features of a programmer, regardless of the programming language.
References
Frantzeskou G., Stamatatos E., Gritzalis S., Chaski C. Identifying Authorship by Byte-Level N-Grams: The Source Code Author Profile (SCAP) Method. International Journal of Digital Evidence, Trier, Germany, 2007, Vol. 6 ( 1), 139–148. URL: https://www.researchgate.net/publication/220542545_Identifying_Authorship_by_Byte-Level_N-Grams_The_Source_Code_Author_Profile_SCAP_Method [Accessed 12 Nov. 2024].
Frantzeskou G., MacDonell S., Stamatatos E., Georgiou S., Gritzalis S. The significance of user-defined identifiers in Java source code authorship identification Computer Systems Science and Engineering. Samos, Greece, 2011.
GrayA., Sallis P., MacDonell S. IDENTIFIED (Integrated Dictionary-based Extraction of Non-language-dependent Token Information for Forensic Identification, Examination, and Discrimination): a dictionary-based system for extracting source code metrics for software forensics. IEEE Computer Society Press, Dunedin, New Zealand, 1998, 252–259 pp. https://doi.org/10.1109/SEEP.1998.707658
Sallis P., Aakjaer A., MacDonell S. Software Forensics: Old Methods for a New Science. IEEE Computer Society Press, Dundin, New Zealand, 1998, 367–371 pp. https://doi.org/10.1109/SEEP.1996.534037
Ding H., Samadzadeh M.H. Extraction of Java program fingerprints for software authorship identification. The Journal of Systems and Software, 2004, Vol. 72 (1), 49–57. https://doi.org/10.1016/S0164-1212(03)00049-9
Abuhamad M., AbuHmed T., Mohaisen A., Nyang, D.H. Large-scale and language-oblivious code authorship identification. Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2018, 101-114. https://doi.org/10.1145/3243734.3243738
Daya G. et al. GraphCodeBERT: Pre-training Code Representations with Data Flow Proceedings of the ICLR 2021. URL: https://www.researchgate.net/publication/344294734_GraphCodeBERT_Pre-training_Code_Representations_with_Data_Flow [Accessed 14 Nov. 2024]
Abbasi A., Javed A.R., Iqbal F. et al. Authorship identification using ensemble learning. Scientific Reports, 2022, Issue 12. https://doi.org/10.1038/s41598-022-13690-4
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Information Technologies and Systems

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The paper is an Open Access under the CC BY-NC-ND 4.0 license - Attribution-NonCommercial-NoDerivatives 4.0 International.