0

I am working on an OCR using Tesseract. When I work local it works fine, but I can't make it work when I deploy to Google App Engine.

This is the line where the code breaks in the deploy:

d = pytesseract.image_to_data(img, output_type=Output.DICT)

I get this error from the gcloud app logs tail -s

File "/app/pol_flow.py", line 1587, in upload_ocr d = pytesseract.image_to_data(img, output_type=Output.DICT) File "/usr/local/lib/python3.6/site-packages/pytesseract/pytesseract.py", line 409, in image_to_data if get_tesseract_version() < '3.05':
File "/usr/local/lib/python3.6/site-packages/pytesseract/pytesseract.py", line 118, in wrapper wrapper._result = func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/pytesseract/pytesseract.py", line 327, in get_tesseract_version raise TesseractNotFoundError() pytesseract.pytesseract.TesseractNotFoundError: /app is not installed or it's not in your path

TesseractNotFoundError() pytesseract.pytesseract.TesseractNotFoundError: /app is not installed> or it's not in your path

I know that I have to pass the location into the code like this:

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

But I don't know the path where Tesseract is installed when I make the deploy.

Thanks for your help!

PS: I followed this answer but when I make the request to the deployed endpoint in App Engine I still get the same error: TesseractNotFoundError()

Ari
  • 21
  • 1
  • 3
  • Hi, Could you please share the command that you are using? gcloud app deploy? And if you could add the flag '--verbosity=debug', and paste the result, please? – Joss Baron Jan 20 '20 at 22:19
  • Hi @BraulioBaron there I updated the question with the error from the log. The deploy works fine, the problem is when I make the request to the endpoint because Tesseract can't find where it is installed. – Ari Jan 21 '20 at 13:49
  • @BraulioBaron *gcloud app deploy pol-app.yaml --verbosity=debug* – Ari Jan 21 '20 at 13:53
  • Hi Ari, the Tesseract library requires platform packages that don't come with the App Engine Standard Python3 runtime. As a workaround you can check [this answer](https://stackoverflow.com/a/58302467/7757976) to deploy your app in Cloud Run. – llompalles Jan 21 '20 at 14:14
  • @llompalles I followed that answer but I still get the error when I deploy: 'TesseractNotFoundError() pytesseract.pytesseract.TesseractNotFoundError: /app is not installed or it's not in your path' – Ari Jan 21 '20 at 15:44
  • @llompalles When I run my code local it works fine, but I need to specify this route: pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe" How do I manage to do that in the deploy? Because I think that I am installing Tesseract correctly but App Engine can't find it. Thank you! – Ari Jan 21 '20 at 19:15

1 Answers1

0

the proper answer to your question would be that is not possible to do this with App Engine Standard. You can achieve this with App Engine Flexible environment: you can opt for a Custom Runtime with a Dockerfile that allows you to install everything you need, in this case, Tesseract simply by adding apt-get install tesseract-ocr.

The other option would be to use Cloud Run, just as suggested by @llompalles. I have implemented the solution he shared and it works for me.

Joss Baron
  • 1,441
  • 10
  • 13