0

I am trying to build a system (using C#) that can recognize text for scene images. I see that scene text recognition is a challenging task because of low resolution, complex background, non-uniform lightning or blurring effects...

Any ideas for overcoming this problem would be appreciated.

rcs
  • 67,191
  • 22
  • 172
  • 153
vudh
  • 1
  • 1
  • 2

3 Answers3

0

I would like to suggest the following papers for an overview of all the techniques proposed in this field:

  • Jung,K., Kim, K.I., Jain, A.K., 2004. Text information extraction in images and video: A survey, Pattern Recognition 37(5), 977-997
  • Jian Liang, David Doermann and Huiping Li. "Camera-Based Analysis of Text and Documents: A Survey." International Journal on Document Analysis and Recognition, 7:2+3, pp. 83 -- 104, July 2005

Although the utmost purpose is to recognize text characters from the scene, how to find the text regions and then extract texts are more difficult than character recognition (OCR) itself.

Jonathan Spooner
  • 7,682
  • 2
  • 34
  • 41
feelfree
  • 11,175
  • 20
  • 96
  • 167
0

I suggest that you'll begin by checking out some open-source text-recognition libraries. See, for example, this thread.

Community
  • 1
  • 1
nojka_kruva
  • 1,454
  • 1
  • 10
  • 23
0

The Stroke Width Transform (SWT) can be used to extract text from natural images.

See this stackoverflow page: Stroke Width Transform (SWT) implementation (Java, C#...)

Here's a helpful video: http://videolectures.net/cvpr2010_epshtein_dtns/

Community
  • 1
  • 1
Rethunk
  • 3,976
  • 18
  • 32
  • Thank you for your help, Rethunk. But now I only focus on text regconition, not text detection as the paper mentioned. The first step of my problem is how to do character segmentation from scene images. I tried using binarization method for it but not helpful in case of character overlap. Any ideas for me in this case? Many thanks. – vudh Jan 11 '12 at 09:27
  • Without SWT or a similar algorithm, you will have a hard time distinguishing text from the background in most images unless there is very high contrast. Binarization works okay for black text on a white background. Look into local thresholding techniques. To avoid recreating the wide variety of known algorithms, review the algorithms in the textbook on vision by Gonzalez and Woods, and and the survey of OCR techniques in the book Character Recognition Systems by Cheriet, Kharma, Liu, and Suen. There is no short answer to your question if you're trying to develop your own OCR library. – Rethunk Jan 14 '12 at 17:18