41

I would like to build an Android application that, via an OCR library, should scan a picture extracting text from it .

What Java library should I use?

systempuntoout
  • 71,966
  • 47
  • 171
  • 241
  • 1
    Google recently released an API to achieve this: https://developers.google.com/vision/text-overview – Wirling Jun 28 '16 at 09:49

4 Answers4

21

Don't know how good it is (it definitely needs to be trained first), but there is Ron Cemer's Java OCR library.

Thilo
  • 257,207
  • 101
  • 511
  • 656
7

If you are looking for a very extensible option or have a specific problem domain you could consider rolling your own using the Java Object Oriented Neural Engine.

I used it successfully in a personal project to identify the letter from an image such as this, you can find all the source for the OCR component of my application on github, here.

davetapley
  • 17,000
  • 12
  • 60
  • 86
6

try tesseract, checkout this article http://www.itwizard.ro/interfacing-cc-libraries-via-jni-example-tesseract-163.html and this example http://code.google.com/p/mezzofanti/

Edit: some more facts - tesseract is one of the best open source OCR used by google - there is training data available for many languages - mezzofanti is an android app that uses tesseract - beware: OCR does use a lot of CPU power. trying to OCR a A4 page with your T-Mob G1 will take a lot of time and the result may not impress you ;-)

BlueWizard
  • 372
  • 4
  • 19
raudi
  • 1,701
  • 1
  • 16
  • 16
  • tesseract does work but its reading ability is quite poor for even the simplest text. – mP. Jun 26 '11 at 11:19
  • thats why you have to train it @mP. - I was able to get good results with the default training while implementing ISBN reader. Try this link, I didn't use their experiences yet but I have it in my bookmarks for a long time And I think it is good source od info http://vbridge.co.uk/2012/11/05/how-we-tuned-tesseract-to-perform-as-well-as-a-commercial-ocr-package/ – Srneczek Dec 15 '16 at 07:29
0

You can use the OCR feature from Google Docs. Check the Documents List Data API http://code.google.com/apis/documents/docs/3.0/developers_guide_protocol.html#OCR

yeradis
  • 5,235
  • 5
  • 25
  • 26