3

I got grayscale images made by cheap camera and I need to make a program OCR. The main problem is noise or objects that are not text but they present in binary image. Now I think of text extraction from image.

I need some good algorithm for that. Can you suggest any really good one? For example if image contains black color text and something like a black line then that algorithm will select only text without line.

user229044
  • 232,980
  • 40
  • 330
  • 338
maximus
  • 4,201
  • 15
  • 64
  • 117
  • 1
    OCR has been around a while. Any reason you *need* to build your own, rather than look for existing tools? As for algorithms, I'm sure there are many. Usually, you need to have existing templates of what characters you're searching for, and then have the program see if any of those templates exist in the image. – FrustratedWithFormsDesigner Apr 07 '10 at 15:30
  • other tools work if image contains a clear text even ocr for handwritten text does not very good for that images. I tried tesseract google, gocr. What do you mean by templates of characters? – maximus Apr 07 '10 at 15:39
  • @maxiumus: the OCR system needs some reference point, it needs to know what a proper "A" looks like before it can recognize an image that *might* be an "A". There is probably more than one way to achieve this, what suggested would only be one of those ways. – FrustratedWithFormsDesigner Apr 07 '10 at 15:42
  • This question is a duplicate of http://stackoverflow.com/questions/1848/locating-text-within-image – tom10 Apr 07 '10 at 19:50
  • You might look here: http://stackoverflow.com/questions/1284214/simple-ocr-programming-tutorials-articles – phimuemue Apr 10 '10 at 11:39
  • There are definitely OCR engines out there that deal with low-quality images (noisy, grainy, etc.) very well. One such engine is from ABBYY. There's an online, pay-per-page API that uses the ABBYY OCR engine: http://www.wisetrend.com/wisetrend_ocr_cloud.shtml – Eugene Osovetsky Jun 23 '10 at 05:47

1 Answers1

2

You describe two types of noise you want to remove. (BTW the wikipedia page for noise reduction isn't bad, look at the "in images" section).

One type is odd dots noise. This is often called "speckle" or "salt and pepper" noise, and is usually removed by some sort of averaging filter. There's a good page describing some algorithms for this at mathworks.

The second type is lines. This is harder, and I wouldn't really describe it as noise, it would be dependent on you input image type. This paper seems appropriate, but it isn't available for free online, so you might have to buy it or go to your local University library.

You might also want to look at this, which is downloadable from many places, but is really for motion pictures (video), so probably not what you want.

Nick Fortescue
  • 43,045
  • 26
  • 106
  • 134