2

I'm writing a script that takes an image and crops the image down to only include the number I want it to recognize. I have that part working fine. The numbers will be either single or double digit.

I've tried using Googles Vision API, which works fine and gives the correct result, but I would rather do it locally to avoid the fees associated with using that service. I'm currently working on using Tesseract OCR https://github.com/tesseract-ocr/tesseract

Example of an image I want it to recognize:

Tesseract is a command line program but I am calling it in a python file that also handles the other parts of my script. I'm not sure if Tesseract is what I want or if there is a better solution to my problem.

sudo tesseract imgName outputFile

The only results I get no matter what image I put through it returns 0 and also shows "Empty page!!"

EDIT:

I am now using pytesseract and I am trying with this code:

print(pytesseract.image_to_string(img))

Nothing is outputted from that so I tried

print(pytesseract.image_to_string(img,config ='--psm 6'))

which outputs random letters it's guessing. Is there a way with tesseract to only look for numbers so my results are narrowed down?

Community
  • 1
  • 1
  • 1
    Please edit into your question your current code and a sample image and show what you want/expect the output to be. – DisappointedByUnaccountableMod Aug 06 '19 at 21:46
  • What research have you done - and why did that searching not get the results you wanted? – DisappointedByUnaccountableMod Aug 06 '19 at 21:49
  • “Edit into your question your current code” I meant: please edit into your question a Minimal Complete Verifiable Example https://www.stackoverflow.com/help/mcve - so people reading this question can copy/paste your example into a file and reproduce the exact same problem you are seeing, i.e. without editing/adding anything. – DisappointedByUnaccountableMod Aug 06 '19 at 21:51
  • 2
    Python Library "pytesseract" is what you're looking for. Read your image through PIL or OpenCV and then pass it to the pytesseract method image_to_string to get the text out of it. – asanoop24 Aug 06 '19 at 21:56
  • it may better works with black text on white background. See documentation: [Improving the quality of the output](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) – furas Aug 06 '19 at 21:57
  • I tried you image with Tesseract v4.0.0-beta.1 and it gives me `>3` in `outputFile.txt` If you run it in Python with some subprocess functions and you get its result then you may get 0 as status code which means "OK" (no errors). Better use `pytesseract`. – furas Aug 06 '19 at 22:04
  • When I try with pytesseract it outputs nothing just a blank line. – jimmyshadow1 Aug 06 '19 at 22:08
  • @jimmyshadow1 Post the code where you're trying with pytesseract – asanoop24 Aug 06 '19 at 22:11
  • ```print(pytesseract.image_to_string(img))``` img is to the path of my image. I also tried Image.open(img) – jimmyshadow1 Aug 06 '19 at 22:13

0 Answers0