Measuring Textual Redundancy, Lexical Richness, and Veracity: a Multi-Metric Approach to Text Evaluation
DOI:
https://doi.org/10.15407/intechsys.2026.02.025Keywords:
Computational Linguistics, Factual Accuracy, Lexical Diversity, Natural Language Processing, Text Analysis, Text Quality, Type-Token Ratio, Wateriness CoefficientAbstract
With the exponential growth of textual data, effective analytical methods are essential. This paper introduces a multi-metric approach to text evaluation, fo cusing on quantifying textual redundancy, lexical richness, and veracity. We explore the theoretical underpinnings and practical applications of the Wateriness Coefficient for redundancy, the Type-Token Ratio (TTR) and various Lexical Diversity Indices for vocabulary richness, and the Factual Accuracy Score for informational integrity. These metrics offer a comprehensive framework for assessing text quality beyond traditional methods, enabling deeper insights into linguistic characteristics and informational reliability in diverse textual datasets.
References
Biber D. Variation Across Speech and Writing. Cambridge University Press, Cambridge, 1988. https://doi.org/10.1017/CBO9780511621024
Goyal P., Pandey S.K., Jain K. Deep learning for natural language processing: Creating neural networks with Python. Apress, 2018. https://doi.org/10.1007/978-1-4842-3685-7
Covington M.A., McFall J.D. Cutting the Gordian knot: The moving-average type–token ratio (MATTR). Journal of Quantitative Linguistics, 2010, Vol. 17 (2), 94–100. https://doi.org/10.1080/09296171003643098
Top NLP Algorithms & Concepts. Data Science Central, (n.d.). URL: https://www.datasciencecentral.com/top-nlp-algorithms-amp-concepts/ [Accessed 12 Feb. 2026]
Jarvis S. Short texts, best-fitting curves and new measures of lexical diversity. Language Testing, 2002, Vol. 19 (1), 57–84. https://doi.org/10.1191/0265532202lt220oa
Jurafsky D., Martin J.H. Speech and Language Processing (3rd ed. draft). Stanford University, (n.d.).
Lu B. A corpus-based evaluation of lexical and syntactic complexity in ESL writing. The 27th International Conference on English Teaching and Learning, 2010, 1–20.
Esposito D., Esposito F. Programming ML.NET: Train, evaluate, and deploy machine learning models in .NET applications. Microsoft Press, 2022.
Panchenko D., Maksymenko D., Turuta O., Luzan M., Tytarenko S., Turuta O. Ukrainian News Corpus as Text Classification Benchmark. The International Conference, 2022. https://doi.org/10.1007/978-3-031-14841-5_37
Maksymenko D., Turuta O. Interpretable Conversation Routing via the Latent Embeddings Approach. Computation, 2024, Vol. 12 (12), Article 237. https://doi.org/10.3390/computation12120237
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Copyright Holder is the publisher of the Paper (The Institute of Information Technologies and Systems of the NAS of Ukraine), and/or the publisher of the Paper (PH "Akademperiodika" of the NAS of Ukraine), to that the The Institute of Information Technologies and Systems of the NAS of Ukraine on the basis of a sublicense publishing agreement granted the right to publish the work and the right to indicate the publisher after the copyright sign.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The paper is an Open Access under the CC BY-NC-ND 4.0 license - Attribution-NonCommercial-NoDerivatives 4.0 International.