Towards Neural Similarity Evaluators

Abstract

We review the limitations of BLEU and ROUGE – the most popular metrics used to assess reference summaries against hypothesis summaries, and come up with criteria for what a good metric should behave like and propose concrete ways to use and test recent Transformers-based Language Models to assess reference summaries against hypothesis summaries.

Publication
In Document Intelligence Workshop NeurIPS'19
Muhammed Yusuf Kocyigit
Muhammed Yusuf Kocyigit
PhD Student at Boston University

My research interests include better evaluating and improving LLMs, understanding the pre-training data of LLMs and computational social sciences