My problem
I have two lists 'predicted' and 'reference'. Each list contains strings, the first one being the predicted elements output by my model, and the latter being the gold-standard. I want to build an automatic error classifier, but can't figure out compare each character within each string within each list. I can compare wordwise (code included below) but I want to look character-by-character.
Below is the code for my word-wise comparer, along with the lists of data I'm working with NB, outside of this toy example, these lists are about 3000 items long.
predicted = ['r * a k t\n', 'd * o u l\n', 'm * i s l\n', 'p * i . v @ p\n']
reference = ['r A k t\n', 'd * o u b\n, 'm * i s l\n, 'i * p . v @ t\n']
########### word-wise finder ##############
p = set(predicted)
r = set(reference)
errors = p - r
return(errors)
My code above returns me:
'r * a k t\n', 'd * o u l\n', 'p * i . v @ p\n'
My dream would be to have a returned list that looks like this:
['* a', 'l', 'p * i', 'p']
because I can then look at each element an classify the mistake it's made. Any advice is appreciated.