4

Anyone know a library in python/ruby that analize images and extract text inside?

Or a book about image processing ect...

PS: The text is in varius fonts and formats but clear, Tl;Dr: No captcha or similar.

Abid Rahman K
  • 51,886
  • 31
  • 146
  • 157
byterussian
  • 3,539
  • 6
  • 28
  • 36
  • 1
    What does the last line you have written convey ? or is it written by mistake ? – Rndm Jul 15 '12 at 07:16
  • possible duplicate of [OCR for recognising handwriting in .NET](http://stackoverflow.com/questions/591574/ocr-for-recognising-handwriting-in-net) – Adam Mihalcin Jul 15 '12 at 07:17
  • @Angelbit I pointed out one particular duplicate, but this question is really a duplicate of almost any OCR question on StackOverflow. – Adam Mihalcin Jul 15 '12 at 07:18
  • Sorry, my english is very poor, the text inside images is written in various sizes and formats (bold, italic ect.) – byterussian Jul 15 '12 at 07:20
  • 1
    @AdamMihalcin Have edit, don't have find any question ruby/python specific. – byterussian Jul 15 '12 at 07:26
  • @Angelbit Ah, the original question didn't mention Python or Ruby. You might find [this question](http://stackoverflow.com/q/9690752/960195), [this question](http://stackoverflow.com/q/11489824/960195), or [this question](http://stackoverflow.com/q/9258825/960195) useful, then. – Adam Mihalcin Jul 16 '12 at 07:09

1 Answers1

15

You can use OpenCV, an opensource computer vision library and It has Python API. It is considered to be an industry-standard library nowadays.

OpenCV official site : http://opencv.org/

If you need some tutorials on OpenCV-Python, visit : opencvpython.blogspot.com

You can also check this SOF : Simple Digit Recognition OCR in OpenCV-Python

In addition to that, OpenCV samples has got some OCR implementations.

But I would recommend you to use Tesseract for OCR. It is the best Open source OCR engine, developed by HP, but now handled by Google.

Tesseract site : https://github.com/tesseract-ocr/tesseract

Python API of tesseract, Pytesser : https://github.com/RobinDavid/Pytesser

Also check this SOF : How do I choose between Tesseract and OpenCV?

So you can use OpenCV to preprocess the image and use Tesseract for OCR.

Community
  • 1
  • 1
Abid Rahman K
  • 51,886
  • 31
  • 146
  • 157