0

I am facing an issue. I am running a python script which converts pdf to image using tesseract.

for filename in path_list:
    print(filename)        
    pdfFile = wi(filename = filename, resolution = 300)
    image = pdfFile.convert('jpeg')
imageBlobs = []

for img in image.sequence:
    imgPage = wi(image = img)
    imageBlobs.append(imgPage.make_blob('jpeg'))

extract = []

for imgBlob in imageBlobs:
    image = Image.open(io.BytesIO(imgBlob))
    text = pytesseract.image_to_string(image, lang = 'eng')

After extracting content from 11 pdfs I get the following error. It's not the problem with the pdf file as when I give that particular pdf separately it extracted its content. I am running the script on Ubuntu 16.04

Any help will be grateful.

Error: -
File "/home/steve/.local/lib/python3.5/site-packages/pytesseract/pytesseract.py", line 170                                                                         ,in run_tesseract
proc = subprocess.Popen(cmd_args, **subprocess_args())
File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
    restore_signals, start_new_session)
File "/usr/lib/python3.5/subprocess.py", line 1490, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
Traceback (most recent call last):
  File "ocr_script.py", line 466, in <module>
  gather_details(path_list)
  File "ocr_script.py", line 45, in gather_details
  discover_data('Indexing',discoveryPath,final_meta,start_time)
  File "ocr_script.py", line 165, in discover_data
  text = pytesseract.image_to_string(image, lang='eng')
  File "/home/steve/.local/lib/python3.5/site 
  packages/pytesseract/pytesseract.py", line 294                                                                            
  , in image_to_string 
  return run_and_get_output(*args)
  File "/home/steve/.local/lib/python3.5/site- 
  packages/pytesseract/pytesseract.py", line 202                                                                            
  , in run_and_get_output
  run_tesseract(**kwargs)
  File "/home/steve/.local/lib/python3.5/site- 
  packages/pytesseract/pytesseract.py", line 172                                                                            
  , in run_tesseract
  raise TesseractNotFoundError()
  pytesseract.pytesseract.TesseractNotFoundError: /usr/bin/tesseract is not 
  installed or it's 
Dmitrii Z.
  • 2,287
  • 3
  • 19
  • 29
Lasit Pant
  • 317
  • 1
  • 8
  • 18
  • Question possibly answered here [Python subprocess.Popen “OSError: Errno 12 Cannot allocate memory”](https://stackoverflow.com/questions/1367373/python-subprocess-popen-oserror-errno-12-cannot-allocate-memory) – Dmitrii Z. Jul 30 '18 at 19:40
  • @dmitrii I figured out here the memory issue was with tessarct – Lasit Pant Aug 01 '18 at 07:15

1 Answers1

0

After further analysis and tweaks I came to conclusion that the problem was with my tesseract rather than OS. Changes I did-

  1. /etc/ImageMagic..(version ) Edit , policy.xml file

changes in policy XML file

These are the parameters where I increased the memory.

Lasit Pant
  • 317
  • 1
  • 8
  • 18