LARGE LANGUAGE MODELS IN EDUCATIONAL MEASUREMENT OF KAZAKH LANGUAGE PROFICIENCY

Nurseit Baizhanov; Aslan Sabyrov; Alfira Makhmutova; Shirali Kadyrov

doi:10.63597/UTO3105-4161.2025.3.3.008

Authors

Nurseit Baizhanov Author https://orcid.org/0009-0008-5302-9858
Aslan Sabyrov Author https://orcid.org/0009-0006-1470-5433
Alfira Makhmutova Author https://orcid.org/0000-0002-8597-7667
Shirali Kadyrov Author https://orcid.org/0000-0002-8352-2597

DOI:

https://doi.org/10.63597/UTO3105-4161.2025.3.3.008

Keywords:

Large Language Models, Artificial Intelligence, Pedagogical Measurement, Unified National Test, Kazakh Language, Educational Assessment

Abstract

This study evaluates the performance of large language models (LLMs) in assessing Kazakh language proficiency within the context of the Unified National Test (UNT) in Kazakhstan. The primary objective is to examine the accuracy, error patterns, and psychometric characteristics of five state-of-the-art LLMs—Gemini 2.5 Pro Preview, Claude 3.7 Sonnet, Deepseek R1, Qwen, and Llama 3.1-405B-Instruct—on 138 multiple-choice questions (MCQs) from the 2024 UNT Kazakh language test. The methodology involved a zero-shot evaluation with standardized prompts, ensuring no external data access, and employed statistical analyses, including Cochran’s Q test, McNemar’s tests, and Generalized Estimating Equations (GEE) logistic regression, to assess model performance across difficulty levels and linguistic topics. Results indicate that Gemini achieved the highest accuracy (90.6%), significantly outperforming other models, while Llama showed the lowest (37.7%). Performance varied by difficulty and topic, with Gemini excelling across all categories and others showing strengths in specific areas like complex linguistic reasoning. The study highlights the potential of LLMs for educational assessment in low-resource languages like Kazakh, while identifying gaps in model optimization, fairness, and reliability, necessitating targeted fine-tuning and culturally relevant data curation.

LARGE LANGUAGE MODELS IN EDUCATIONAL MEASUREMENT OF KAZAKH LANGUAGE PROFICIENCY

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

Language

Information

Make a Submission