Klaudia-Doris Thellmann

I am a Machine Learning Engineer / NLP researcher at TU Dresden, working on multilingual LLM evaluation and scalable benchmarking pipelines. My background is in computer science (university degree) with a background in data management (Big Data architectures & analytics), now focusing on NLP and large language models (LLMs). My research focuses on the reliable evaluation of multilingual LLMs, especially the validity of translated benchmarks, translation-aware evaluation, and culturally robust multilingual benchmarking.

Experience

TU Dresden (ZIH/VDR) — Machine Learning Engineer (NLP/LLMs), since 01/2023
TU Dresden (ScaDS.AI) — Research Associate (NLP), 02/2020–02/2021
Fraunhofer IAIS — Data Scientist / Lecturer (Big Data architectures), 01/2017–06/2019
Fraunhofer IAIS — Software Engineer / Big Data Architect, 07/2013–12/2016
University of Bonn (EIS) — Research Associate (Semantic Web), 01/2014–11/2015

Research Interests

Multilingual LLM evaluation & benchmark design
Translation artifacts & translation-aware metrics
Automated benchmark QA (e.g., MT quality estimation, LLM-as-a-judge)
Human-aligned evaluation / preference validation
Efficient large-scale evaluation on HPC clusters

Contact: klaudia-doris.thellmann [at] tu-dresden [dot] de

GitHub · LinkedIn · Google Scholar · X

News

2026 ACL 2026 — “Quantifying the Impact of Translation Errors on Multilingual LLM Evaluation.”
2026 LREC 2026 — “Diagnosing Translated Benchmarks: An Automated Quality Assurance Study of the EU20 Benchmark Suite.”
2025 EACL 2025 — “Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs.”
2024 EMNLP 2024 — “Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?”
2024 Findings of NAACL 2024 — “Tokenizer Choice For LLM Training: Negligible or Crucial?”