2

I've been working on this project for a few months now and need some assistance. I am trying to recognize the characters on an image from an old camera that used a seven segment LED to imprint the frame number in the lower right hand corner of the image. issue is that these images are of people and the skin tone messes with some of the images with lighter skin tone. But most in general cant be recognized by conventional OCR, I am attempting to use Tesseract but have been unsuccessful in compiling the application just to test with it and train it. I am here to ask if any one else has a similar problem or knows of an alternative to Tesseract to be able to recognize these characters. preferably a trainable ocr. My searches have come up null.

example image

Spektre
  • 49,595
  • 11
  • 110
  • 380
Acfarris1
  • 25
  • 5
  • How many images do you want to process? Because you might need to do machine learning, which is difficult to get correct. For small numbers (like 1000), you might be better off doing data entry manually or hiring labor through Mechanical Turk or something? – Nayuki Nov 12 '15 at 05:39
  • on the heaviest it would be 3,500 a day i believe but average of around 1k – Acfarris1 Nov 12 '15 at 16:05
  • It's a continuous stream of work? I guess you do want some automation. – Nayuki Nov 12 '15 at 16:29
  • this ocr will be plugged into another external app we are building. – Acfarris1 Nov 12 '15 at 17:06

1 Answers1

0

Most OCR's have trouble with image background so you should first try to isolate the text first. As this is computer-rendered text then:

  1. text is most likely on the same place in image

    So no need to search for it.

  2. text is done most likely always with the same font

    If you have it that will ease up things a lot and can use even simple methods like per pixel comparison or correlation coefficient with better results then neural network based classification. You can also try this simple OCR.

  3. you can detect the "exact" color of text filtering out all the rest

    Try to detect if the text is solid or transparent (add/xor to image pixels). Anyway after this it should not be too much hard to detect text pixels. Once you can detect your text pixels reliably black out everything else and then use OCR.

Community
  • 1
  • 1
Spektre
  • 49,595
  • 11
  • 110
  • 380
  • the thing is i can usually isolate the characters on darker skin individuals but on some one that has a lighter skin tone is involved i cant detect the difference in the skin or the characters. – Acfarris1 Nov 12 '15 at 16:04
  • @Acfarris1 first try to make images with white,red,green,blue,black and shades of grays background to determine how the text is added to image. Is hard to say from single image but look like some kind of blending (color is added to original pixel) if that is true then the non saturated edge should be detectable pretty reliable and easy. But that is just first look idea ... as the segments are distorted either by image compresion or some kind of antialiasing effect is hard to tell – Spektre Nov 12 '15 at 18:07
  • i know with the cameras that are being used the frame count is not being added digitally but by capturing the light from a led emitter to the light sensor (real old school) – Acfarris1 Nov 12 '15 at 19:45
  • @Acfarris1 In that case you should try to scan for bumps in all channels that match the shape and thickness and are proportional to each channel. So first find the properties of the segment light and then form a scan line search something similar to this: [How to find horizon line efficiently in a high-altitude photo?](http://stackoverflow.com/a/22195176/2521214) – Spektre Nov 13 '15 at 08:04