Architectural Evolution of Transformer Models in NLP: A Comparative Survey of Recent Developments

Diyar Waysi Naaman; Berivan Tahir Ahmed; Ibrahim Mahmood Ibrahim

doi:10.33022/ijcs.v14i5.4984

Authors

Diyar Waysi Naaman Ministry of Education, General Directory of Education in Duhok, Kurdistan Region, Iraq
Berivan Tahir Ahmed Akre University for Applied Science - Department of Computer Networks and Information
Ibrahim Mahmood Ibrahim Akre University for Applied Science -Department of Computer Networks and Information

DOI:

https://doi.org/10.33022/ijcs.v14i5.4984

Keywords:

XLM-RoBERTa, XLM-R, NLP Model, BERT, Transfer Model, Natural Language Processing.

Abstract

This literature review examines the impact and advancements of XLM-RoBERTa in the field of multilingual natural language processing. As language technologies increasingly transcend linguistic boundaries, XLM-RoBERTa has emerged as a pivotal cross-lingual model that extends the capabilities of its predecessors. Through comprehensive pre-training on multilingual corpora spanning 100 languages, this model demonstrates remarkable zero-shot cross-lingual transfer capabilities while maintaining competitive performance on monolingual benchmarks. This review synthesizes research findings on XLM-RoBERTa's architecture, pre-training methodology, and performance across diverse NLP tasks including named entity recognition, question answering, and text classification. By examining comparative analyses with other multilingual models, we identify key strengths, limitations, and potential directions for future research. The findings underscore XLM-RoBERTa's significance in advancing language-agnostic representations and bridging the performance gap between high-resource and low-resource languages, with substantial implications for global accessibility of language technologies.