17

I am trying to extract numbers from a typical scoreboard that you would find at a high school gym. I have each number in a digital "alarm clock" font and have managed to perspective correct, threshold and extract a given digit from the video feed

Sample input

Here's a sample of my template input

Template input

My problem is that no one classification method will accurately determine all digits 0-9. I have tried several methods

1) Tesseract OCR - this one consistently messes up on 4 and frequently returns weird results. Just using the command line version. If I actually try to train it on an "alarm clock" font, I get unknown character every time.

2) kNearest with OpenCV - I search a database consisting of my template images (0-9) and see which one is nearest. I frequently get confusion between 3/1 and 7/1

3) cvMatchShapes - this one is fairly bad, it usually can't tell the difference between 2 of the digits for each input digit

4) Tangent Distance - This one is the closest, but the smallest tangent distance between the input and my templates ends up mapping "7" to "1" every time

I'm really at a loss to get a classification algorithm for such a simple problem. I feel I have cleaned up the input fairly well and it's a fairly simple case for classification but I can't get anything reliable enough to actually use in practice. Any ideas about where to look for classification algorithms, or how to use them correctly would be appreciated. Am I not cleaning up the input? What about a better input database? I don't know what else I'd use for input, each digit and template looks spot on at this point.

pyromanfo
  • 353
  • 2
  • 9

5 Answers5

10

The classical digit recognition, which should work well in this case is to crop the image just around the digit and resize it to 4x4 pixels.

A Discrete Cosine Transform (DCT) can be used to further slim down the search space. You could select the first 4-6 values.

With those values, train a classifier. SVM is a good one, readily available in OpenCV.

It is not as simple as emma's or martin suggestions, but it's more elegant and, I think, more robust.

Given the width/height ratio of your input, you may choose a different resolution, like 3x4. Choose the smallest one that retains readable digits.

Sam
  • 19,708
  • 4
  • 59
  • 82
  • I used a 3x5 image (similar to rows/columns in digital display) and it works great with kNearest searching. Dead on. Thanks! – pyromanfo Nov 09 '11 at 21:36
6

Given the highly regular nature of your input, you could define a set of 7 target areas of the image to check. Each area should encompass some significant portion of one of the 7 segments of each digital of the display, but not overlap.

You can then check each area and average the color / brightness of the pixels in to to generate a probability for a given binary state. If your probability is high on all areas you can then easily figure out what the digit is.

It's not as elegant as a pure ML type algorithm, but ML is far more suited to inputs which are not regular, and in this case that does not seem to apply - so you trade elegance for accuracy.

Emma
  • 6,112
  • 2
  • 18
  • 11
4

Might sound silly but have you tried simply checking for black bars vertically and then horizontally in the top and bottom halfs - left and right of the centerline ?

Martin Beckett
  • 94,801
  • 28
  • 188
  • 263
  • I implemented your simple idea for a problem I was working on and it works fine. Just had to check for big black bars vertically from the left, while also capturing the first vertical bar 'stats'. – CopyPasteIt Nov 18 '19 at 16:37
2

If you are trying text recognition with Tesseract, try passing not one digit, but a number of duplicated digits, sometimes it could produce better results, here's the example. However, if you're planning a business software, you may want to have a look at a commercial OCR SDK. For example, try ABBYY FineReader Engine. It's not affordable for free to use applications, but when it comes to business, it can a good value to your product. As far as i know, ABBYY provides the best OCR quality, for example check out http://www.splitbrain.org/blog/2010-06/15-linux_ocr_software_comparison

Nikolay
  • 2,206
  • 3
  • 20
  • 25
0

You want your scorecard image inputs S feeding an algorithm that maps them to {0,1,2,3,4,5,6,7,8,9}.

Let V denote the set of n-tuples of integers.

Construct an algorithm α that maps each image S to a n-tuple

(k1,k2,...,kn)

that can differentiate between two different scoreboard digits.

If you can specify the range of α then you only have to collect the vectors in V that correspond to a digit in order to solve the problem.

I've applied this idea using Martin Beckett's idea and it works. My initial attempt was a simple injection into a 2-tuple by vertical left-to-right summing, with the first integer a image column offset and the second integer was the length of a 'nice' vertical line.

This did not work - images for 6 and 8 would map to the same vectors. So I needed another mini-info-capture for my digit input types (they are not scoreboard) and a 3-tuple info vector does the trick.

CopyPasteIt
  • 532
  • 1
  • 8
  • 22