3

I'd like to know if there are any techniques/APIs that can be used to do fast screen fonts OCR?

The following is taken for granted:

  • the text to OCR shall come from a screenshot and shall be rendered using screen fonts
  • the text to OCR may or may not be anti-aliased
  • anti-aliasing may or not use RGB decimation (aka sub-pixel AA aka ClearType etc.)
  • the screenshot may be in RGB or RBG order
  • the baseline is trivial to find (just look at all the screen fonts: baseline appear very clearly and are easy to find algorithmically)
  • a lot of errors are allowed (characters recognition doesn't need to be 100% correct at all)
  • fonts are basically known in advance but how exactly the fonts are rendered is not (the size is unknow, the color is unknown, the type of anti-aliasing is unknown). Basically what is known is that it's going to be very common fonts

So I suppose it's not anywhere near as complicated as doing "real" OCR: finding the baseline and "cutting" each character is quite easy to do (I've already done it).

Does anyone know about specific techniques or paper(s) or even APIs allowing to do such a feat?

Note that: this question is not about screen-scraping. This question is not about breaking CAPTCHAs. This question is not about regular OCR (as in OCRing a scanned text). This question is not about GUI-automation (altough some may use it that way).

SyntaxT3rr0r
  • 27,745
  • 21
  • 87
  • 120
  • Please see if any of the answers to this question http://stackoverflow.com/q/896224/377657 applys to your situation. – rwong Jun 30 '11 at 07:23

2 Answers2

1

I have good experience with invariant moments (for example Hu moments, but they may be little too onvariant for your purpose, as you have predefined orientation) for feature extraction paired with cluster analysis (I got really good results with Mahalanobis distance).
In case you are interested in pure java solution, here is our SF Project:

http://sourceforge.net/projects/javaocr/

This also works on android phones.

( help is welcome )

Konstantin Pribluda
  • 12,329
  • 1
  • 30
  • 35
  • Scale invariance comes from invariant moments, Mahalnobis distance has nothing to do with it, it is from clustaer analysis domain. SF Project referenced by me contais implementations of everzthing and also working android demo. – Konstantin Pribluda Jul 08 '11 at 13:21
0

You may try to implement a LAMSTAR as described in Daniel Graupe's "Principles of Artificial Neural Networks" (1997), chapter 13.

It involves basically:

  • dividing your "input" into "subwords" (he takes the examples of subdividing the image in sequences of pixels, one subword per column and one subword per row)
  • each subword is fed into a dynamic KSOM (Kohonen Self-Organising Map), which categorizes the normalized subword in a varying number of categories
  • Each KSOM is a Winner-Take-All classifier, yielding 1 for one of its output and 0 for all others
  • Then, the outputs are linearly combined with "link weights to the output layer", with a non-linear activation function (e.g. the logistic function), and the excitation of output neurons gives you a bit sequence that represents the recognised character.

The advantage of the LAMSTAR is that everything is traceable:

  • You can know what the NN sees by considering the input you feed it,
  • You can know what the NN considers it sees by observing the result of the classification by the KSOMs.
  • You can know what the NN wants to see by considering the weight vectors of a particular K-SOM
  • You can know what the NN really considers important (and what parts of the image it ignores) by comparing the link weights.
Laurent LA RIZZA
  • 2,905
  • 1
  • 23
  • 41