I have two lists, one containing true values selected by humans and a second list with extracted values. I would like to measure how well the pipeline is performing based on how many true values are contained in the extracted list. Example:
extracted_value = ["value", "of", "words", "that", "were", "tracked"]
real_value = ["value", "words", "that"]
I need a metric that describes: 3 out of 3 real values were extracted
For multiple Documents: 5 out of 10 real values were extracted 2 out of 3 real values were extracted 1 out of 9 real values were extracted
Based on the individual comparison, can I get a score that describes how well the extracted keywords perform on average across all documents?