Machine Translation

In May 2026, I presented our work “Diagnosing Translated Benchmarks: An Automated Quality Assurance Study of the EU20 Benchmark Suite” at LREC in Palma, Mallorca. The main theme I took away from the conference was that multilingual evaluation is moving beyond simply translating English benchmarks. We also need to ask whether the translated evaluation data is structurally sound, semantically reliable, and documented well enough to support fair model comparisons. This post briefly summarizes our paper and my main personal takeaways from the conference. ...