Diagnosing translated benchmarks

Overview

This project investigates how automated quality assurance methods can help diagnose translated benchmark items before they are used for multilingual LLM evaluation.

Associated publication

Diagnosing Translated Benchmarks: An Automated Quality Assurance Study of the EU20 Benchmark Suite.
Klaudia-Doris Thellmann, Bernhard Stadler, Michael Färber — LREC 2026.

Resources

Resource	Link
Paper	TODO
Code	TODO
Dataset / benchmark material	TODO
Slides / poster	TODO
Blog post	LREC 2026 conference notes

Research questions

How can translated benchmark items be checked systematically and at scale?
Which types of translation artifacts are especially relevant for model evaluation?
How can benchmark QA become part of reproducible multilingual evaluation pipelines?

Overview#

Associated publication#

Resources#

Research questions#

Overview

Associated publication

Resources

Research questions