2

I've been looking into spelling correction models and I'm trying to find some evaluation metrics. If you consider false negatives to be trying to fix an already correct word and false positives to be missing an error, then you could calculate precision, recall, and accuracy. However, these metrics do not say anything about the quality of the correction model (whether or not it successfully corrected a wrong word into what the user meant to type) and only evaluates the spell checking capabilities rather than the correcting capabilities.

Jonathan
  • 51
  • 7

1 Answers1

0

In many languages, the hardest part is picking the correct replacement among many candidates. For example, should lck be lack, lick, lock, ick, or luck? (Out of context, of course, you can't tell!)

So the metric you are looking for is the amount of accurate corrections. Errors you didn't attempt to fix, and correct words you incorrectly replaced are going to be drowned out by the errors you found but didn't accurately correct, though you might still want to tally these cases separately.

If your correction candidate ranking algorithm is standalone, you can steamline the process significantly by evaluating it in isolation.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • So I could measure the percent of errors that were successfully corrected? What would i assign to be false negative and false positive to get a measurement of the **quality** of the corrections? – Jonathan Jun 30 '16 at 18:12
  • False negatives and false positives don't really make much sense here IMHO; they are a classification measurement, not a correction measurement. Quick googling turned up [*Estimation of quality of service in spelling correction using Kullback–Leibler divergence* (Varol & Bayrak 2011)](http://opensample.info/estimation-of-quality-of-service-in-spelling-correction-using-kullback-leibler-divergence) but I can't tell off hand if that's helpful. (I note a punctuation error in their abstract, tee hee.) – tripleee Jul 01 '16 at 04:13
  • If you want to squeeze this into an FP/FN model, maybe regard as false negatives the errors which the system did not attempt to correct, and as false positives any correction which didn't produce the correct result. True negatives, then, are correctly spelled words which were not changed, and true positives, the successful corrections. (This inverts your meaning of "positive" and "negative" but this makes more sense to me.) – tripleee Jul 01 '16 at 04:21
  • I guess I could put it into a FP/FN model, but I feel like doing so would give results that aren't very representative of how the corrector did. Maybe do an overall accuracy and accuracy of correction for the misspelled words? – Jonathan Jul 03 '16 at 14:42
  • In the long run, a single number or pair of numbers may not be useful, especially if they are not a standard measure. Dividing into accuracy of error detection and accuracy of correction candidate generation may be the way to go. – tripleee Jul 03 '16 at 14:53
  • Perhaps I could use a FP/FN model to calculate accuracy, precision, and recall for error detection along with accuracy of the corrections themselves? – Jonathan Jul 04 '16 at 19:27
  • Something like that, yeah. – tripleee Jul 05 '16 at 05:38