I am trying to understand the concept of evaluating the machine translation evaluation scores.
I understand how what BLEU score is trying to achieve. It looks into different n-grams like BLEU-1,BLEU-2, BLEU-3, BLEU-4 and try to match with the human written translation.
However, I can't really understand what METEOR score is for evaluating MT quality. I am trying understand the rationale intuitively. I am already looking into different blog post but cant really figure out.
How these two evaluation metrics are different and how they are relevant?
Can anyone please help?