In the context of optical character recognition, I will try to summarize my issue at best :
I have reference sentence and a prediction sentence.
With Levenshtein editops function, I made a list that contains a tuple which contains : a step type (insertion, replace, substitution), the character modified in reference sequence, a character modified in a prediction sequence, and finally the number of times these changes are made in all of the reference sentence (in fact, maximum number of occurrences where these pairs of errors return)
[(('insert', 'e', 'm'), 11), (('insert', 't', 'a'), 8), (('insert', 'r', 'o'), 5), (('replace', 'a', 'e'), 2), (('replace', 't', 'T'), 1), (('replace', 'r', 'R'), 1), (('replace', 'M', 'm'), 1), (('delete', ' ', 'a'), 1), (('replace', 'p', 'o'), 1), (('replace', 't', 'a'), 1), (('replace', 'e', 'e'), 1), (('replace', ' ', 'r'), 1), (('insert', ' ', 'd'), 1), (('replace', ' ', 'd'), 1), (('replace', 'i', 'e'), 1), (('replace', 'l', 's'), 1)]
- Is it possible to make a sort of "confusion matrix" with these pairs of errors and maximum number of occurrences, from the previous list ? like this :
Output example
Predicted e m t a r ...continue
Reference
e 1 11 0 0 0
m 0 0 0 0 0
t 0 0 0 8 0
a 2 0 0 0 0
r 0 0 0 0 0
...continue
or like this (without labels):
[[1 11 0 0 0
0 0 0 0 0
0 0 0 8 0
2 0 0 0 0
0 0 0 0 0]]
Note : the value 0 is replaced by default in this 'matrix' examples when a character error pair is not encountered.
- in a second time, is it possible to obtain a visualization of this 'matrix'? with matplotlib or seaborn, for example.
a track to solve it ? thanks in advance.