9

I am using pytesseract lib to extract text from image. This works fine when I am running code on localhost. But gives me above error when I deploy on openshift.

Below is code what I have written so far.

try:
  import Image
except ImportError:
  from PIL import Image
import pytesseract
filePath = PATH_WHERE_FILE_IS_LOCATED # '/var/lib/openshift/555.../app-root/data/data/y.jpg'
text = pytesseract.image_to_string(Image.open(filePath))  # this line produces error

Traceback of above error is

>>> pytesseract.image_to_string(Image.open(filePath))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/var/lib/openshift/56faaee42d527151d5000089/app-  root/runtime/repo/pytesseract/pytesseract.py", line 132, in  image_to_string
boxes=boxes)
File "/var/lib/openshift/56faaee42d527151d5000089/app-root/runtime/repo/pytesseract/pytesseract.py", line 73, in run_tesseract
stderr=subprocess.PIPE)
File "/opt/rh/python27/root/usr/lib64/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/opt/rh/python27/root/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

But Image.open(filePath) returns object reference

 <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1366x768 at 0x7FC5A9F719D0>

How to remove this error ? thanks in advance!!

Suraj Palwe
  • 2,080
  • 3
  • 26
  • 42
  • Should it be `fpath`, or `filePath`? Your top code chunk shows you setting the path to `filePath` and using that, while your traceback shows `Image.open` being called on `fpath` – user5219763 Apr 14 '16 at 13:57
  • Sorry, I was testing so wrote fpath in terminal :( – Suraj Palwe Apr 14 '16 at 13:59
  • @SurajPalwe So it's solved? – jDo Apr 14 '16 at 14:01
  • NO. error is not solved – Suraj Palwe Apr 14 '16 at 14:07
  • Localhost means windows? May a case-sensitive issue? – wenzul Apr 14 '16 at 14:15
  • @wenzul localhost is my machine, which has ubuntu 15.04 running! – Suraj Palwe Apr 14 '16 at 14:26
  • Is the tesseract binary in the path on the machine where it fails? It's the `subprocess.Popen(command,stderr=subprocess.PIPE)` that bombs, I can't think of any other file this would be looking for. – Dan Mašek Apr 17 '16 at 05:06
  • @DanMašek so what should I do to make it work ? – Suraj Palwe Apr 17 '16 at 05:11
  • @SurajPalwe First I would really verify that this is the case. If not, identify where the binary is actually located. Then either add it to the path, or in pytesseract.py change line `tesseract_cmd = 'tesseract'` appropriately (it's the one after `# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY`). – Dan Mašek Apr 17 '16 at 05:13
  • if you `fh=Image.open(filePath)` then `pytesseract.image_to_string(fh)` you probably get the same error. It is not `filePath` it has problem with but something else. – dashesy Apr 17 '16 at 07:09
  • As mentioned [here](http://stackoverflow.com/questions/28741563/pytesseract-no-such-file-or-directory-error) install `tesseract-ocr` – LearnerEarner Apr 17 '16 at 09:52
  • @LearnerEarner I did copy paste from where `dist-packages` are stored on my pc, but this problem arises on `openshift` – Suraj Palwe Apr 17 '16 at 11:24
  • Why copy paste? Why not install properly? You can use [`rhc ssh`](https://developers.openshift.com/getting-started/windows.html#remote-access) to run commands. An example - http://stackoverflow.com/questions/24572276/install-python-packages-on-openshift – LearnerEarner Apr 17 '16 at 20:04
  • Open a file convert to `buffer` before using. Source string not equal to source buffer ! Mean : `Don't work with tmp file object`(You try this : `open(open(x))`) – dsgdfg Apr 20 '16 at 15:31
  • As I've answered in your previous post, **you does not have tesseract binary installed on your system**. Error comes from `subprocess.call`ing **missing tesseract binary**. – Łukasz Rogalski Apr 21 '16 at 13:15

6 Answers6

4

Either you don't have tesseract-ocr installed on "openshift", or it is not in your PATH. See https://pypi.python.org/pypi/pytesseract/0.1 Check that you can execute tesseract command from command line.

Konstantin Svintsov
  • 1,607
  • 10
  • 25
4

As mentioned here install tesseract-ocr

You can rhc ssh to run commands. More windows specific details can be found here.

Community
  • 1
  • 1
LearnerEarner
  • 1,020
  • 6
  • 12
4

IMHO and if i understand well openshift, it maybe like Heroku, where the filesystems are volatile and the paths must be from slightly different or totally different,so, at first check:

  1. the paths are the same as in your local dev environment
  2. the paths exist
  3. you have enough rights to access the files in paths
  4. Please check openshift docs, file system specially:

I hope i was helpfull

4

Try this code, and check where is the error:

try:
  import Image
  print("image not from PIL")
except ImportError:
  print("image from PIL")
  from PIL import Image
import pytesseract
import os
filePath = PATH_WHERE_FILE_IS_LOCATED # '/var/lib/openshift/555.../app-root/data/data/y.jpg'
if not os.path.exist(filePath):
    print("no image file")
I=None
try:
    I=Image.open(filePath)
except Exception as e:
    raise RuntimeError(" Can't open image because %s"% e)
text = pytesseract.image_to_string(I)  # this line produces error

PS: You can print modules versions like this:

print Image.__version__
Valeriy Solovyov
  • 5,384
  • 3
  • 27
  • 45
  • Code in last line `pytesseract.image_to_string(I)` produces error! . Above statements work fine. I think there must be some code in `pytesseract` which is accessing file system in normal mode what we access in windows or unix system. but in openshift we have to access using `ENV_VARIABLES` – Suraj Palwe Apr 23 '16 at 08:54
3

I think you may have not entered the correct path to the image. You should keep your paths in check.

Also have you verified the installation of tesseract-ocr? You should see that no errors are produced when you call the module using the import function and by checking it from the command line utility.

And as Wuelfhis Asuaje says you should make sure you have enough rights to access the files in the path.

Vin
  • 729
  • 9
  • 15
2

You should install google tesseract-ocr from http://code.google.com/p/tesseract-ocr/.

Make sure the tesseract command is available on the server.

Under the hood, pytesseract invokes the tesseract command with subprocess (https://github.com/madmaze/pytesseract/blob/master/src/pytesseract.py#L93):

proc = subprocess.Popen(command,
            stderr=subprocess.PIPE)

Now guess what happens if the command is not available?

In [45]: subprocess.Popen(['tesseract'])
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-45-f4e9dd5a7f0b> in <module>()
----> 1 subprocess.Popen(['tesseract'])

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.pyc in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags)
    708                                 p2cread, p2cwrite,
    709                                 c2pread, c2pwrite,
--> 710                                 errread, errwrite)
    711         except Exception:
    712             # Preserve original exception in case os.close raises.

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.pyc in _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, universal_newlines, startupinfo, creationflags, shell, to_close, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite)
   1333                         raise
   1334                 child_exception = pickle.loads(data)
-> 1335                 raise child_exception
   1336
   1337

OSError: [Errno 2] No such file or directory
satoru
  • 31,822
  • 31
  • 91
  • 141