Anyone know a library in python/ruby that analize images and extract text inside?
Or a book about image processing ect...
PS: The text is in varius fonts and formats but clear, Tl;Dr: No captcha or similar.
Anyone know a library in python/ruby that analize images and extract text inside?
Or a book about image processing ect...
PS: The text is in varius fonts and formats but clear, Tl;Dr: No captcha or similar.
You can use OpenCV, an opensource computer vision library and It has Python API. It is considered to be an industry-standard library nowadays.
OpenCV official site : http://opencv.org/
If you need some tutorials on OpenCV-Python, visit : opencvpython.blogspot.com
You can also check this SOF : Simple Digit Recognition OCR in OpenCV-Python
In addition to that, OpenCV samples has got some OCR implementations.
But I would recommend you to use Tesseract for OCR. It is the best Open source OCR engine, developed by HP, but now handled by Google.
Tesseract site : https://github.com/tesseract-ocr/tesseract
Python API of tesseract, Pytesser : https://github.com/RobinDavid/Pytesser
Also check this SOF : How do I choose between Tesseract and OpenCV?
So you can use OpenCV to preprocess the image and use Tesseract for OCR.