Let's say I have two string lists in Python (but this problem is not really language-specific):
a = ["cat", "dog", "fish"]
b = ["cat", "dog", "fish"]
My goal is to be able to quantify the difference of those two lists. More specifically, my program has to calculate how "alike" is list 1 to list 2 and give it a "score". I am using that to calculate the error in some results I get. I process some audio and I get an list which is the result. I want to compare that result to the result I should have gotten.
Therefore, in the above examples the result is identical to the correct result so the answer should be 1 (100%).
In this case:
a = ["cat", "dog", "fish", "lion"]
b = ["cat", "dog", "fish", "tiger"]
The result is 0.75 (75%).
Here is my code:
def compare_lists(result, correct):
# TODO: This could be way better.
if len(result) != len(correct):
return 0
else:
sum = 0
for i in range(0, len(result)):
if result[i] == correct[i]:
sum += 1
return float(sum) / float(len(result))
However, problems arise when the lists have different lengths. For example:
a = ["cat", "dog", "zebra", "fish"]
b = ["dog", "zebra", "fish"]
The logic described before cannot be applied here. In this case, b
is almost the same as a
but a
has one more element in the beginning. I want to be able to correctly quantify this "similarity", as my current algorithm returns 0, but in reality my result with the correct result do not have a big difference.