Measuring Textual Redundancy, Lexical Richness, and Veracity: a Multi-Metric Approach to Text Evaluation

Authors

DOI:

https://doi.org/10.15407/intechsys.2026.02.025

Keywords:

Computational Linguistics, Factual Accuracy, Lexical Diversity, Natural Language Processing, Text Analysis, Text Quality, Type-Token Ratio, Wateriness Coefficient

Abstract

With the exponential growth of textual data, effective analytical methods are essential. This paper introduces a multi-metric approach to text evaluation, fo cusing on quantifying textual redundancy, lexical richness, and veracity. We explore the theoretical underpinnings and practical applications of the Wateriness Coefficient for redundancy, the Type-Token Ratio (TTR) and various Lexical Diversity Indices for vocabulary richness, and the Factual Accuracy Score for informational integrity. These metrics offer a comprehensive framework for assessing text quality beyond traditional methods, enabling deeper insights into linguistic characteristics and informational reliability in diverse textual datasets.

References

Biber D. Variation Across Speech and Writing. Cambridge University Press, Cambridge, 1988. https://doi.org/10.1017/CBO9780511621024

Goyal P., Pandey S.K., Jain K. Deep learning for natural language processing: Creating neural networks with Python. Apress, 2018. https://doi.org/10.1007/978-1-4842-3685-7

Covington M.A., McFall J.D. Cutting the Gordian knot: The moving-average type–token ratio (MATTR). Journal of Quantitative Linguistics, 2010, Vol. 17 (2), 94–100. https://doi.org/10.1080/09296171003643098

Top NLP Algorithms & Concepts. Data Science Central, (n.d.). URL: https://www.datasciencecentral.com/top-nlp-algorithms-amp-concepts/ [Accessed 12 Feb. 2026]

Jarvis S. Short texts, best-fitting curves and new measures of lexical diversity. Language Testing, 2002, Vol. 19 (1), 57–84. https://doi.org/10.1191/0265532202lt220oa

Jurafsky D., Martin J.H. Speech and Language Processing (3rd ed. draft). Stanford University, (n.d.).

Lu B. A corpus-based evaluation of lexical and syntactic complexity in ESL writing. The 27th International Conference on English Teaching and Learning, 2010, 1–20.

Esposito D., Esposito F. Programming ML.NET: Train, evaluate, and deploy machine learning models in .NET applications. Microsoft Press, 2022.

Panchenko D., Maksymenko D., Turuta O., Luzan M., Tytarenko S., Turuta O. Ukrainian News Corpus as Text Classification Benchmark. The International Conference, 2022. https://doi.org/10.1007/978-3-031-14841-5_37

Maksymenko D., Turuta O. Interpretable Conversation Routing via the Latent Embeddings Approach. Computation, 2024, Vol. 12 (12), Article 237. https://doi.org/10.3390/computation12120237

Downloads

Published

2026-06-01

How to Cite

Kobylin, I., Biehunova, V., Tsyban, D., & Kovalchuk, V. (2026). Measuring Textual Redundancy, Lexical Richness, and Veracity: a Multi-Metric Approach to Text Evaluation. Information Technologies and Systems, 8(2), 25–45. https://doi.org/10.15407/intechsys.2026.02.025

Issue

Section

Intellectual Information Technologies