5

Trying to run tesseract on python, this is my code:

import cv2
import os
import numpy as np
import matplotlib.pyplot as plt
import pytesseract
import Image
# def main():
jpgCounter = 0
    for root, dirs, files in os.walk('/home/manel/Desktop/fotografias etiquetas'):
    for file in files:
        if file.endswith('.jpg'):
        jpgCounter += 1

for i in range(1, 2):

    name                = str(i) + ".jpg"
    nameBW              = str(i) + "_bw.jpg"
    img                 = cv2.imread(name,0) #zero -> abre em grayscale
    # img                 = cv2.equalizeHist(img)
    kernel = np.array([[0,-1,0], [-1,5,-1], [0,-1,0]])
    img = cv2.filter2D(img, -1, kernel)
    cv2.normalize(img,img,0,255,cv2.NORM_MINMAX)
    med                 = np.median(img)



    retval, threshold_manual    = cv2.threshold(img, med*0.6, 255, cv2.THRESH_BINARY)
    cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY,11,2)
    print(pytesseract.image_to_string(threshold_manual, lang='eng', config='-psm 11', nice=0, output_type=Output.STRING))

the error im getting is the following:

NameError: name 'Output' is not defined

Any idea why I'm getting this? thank you!

mbc
  • 91
  • 3
  • 11
  • 6
    Try writing `pytesseract.Output.STRING`. – Vasilis G. Jan 20 '18 at 14:17
  • 2
    @VasilisG. corrected to this: output_type=pytesseract.Output.STRING got this(different error! ): AttributeError: 'module' object has no attribute 'Output' – mbc Jan 20 '18 at 14:20
  • 1
    According to the [documentation](https://github.com/madmaze/pytesseract/blob/master/src/pytesseract.py#L163) of `pytesseract`, `output_type` has a default value of `Output.STRING`, so you can omit that argument, as well as the `nice` argument in your case. – Vasilis G. Jan 20 '18 at 14:27
  • 1
    @VasilisG. thank you for your suggestion. the problem is that im getting a different error when i do so. AttributeError: File "img_proce_clean2.py", line 35, in print(pytesseract.image_to_string(threshold_manual, config='-psm 11')) File "/home/manel/.local/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 104, in image_to_string if len(image.split()) == 4: AttributeError: 'numpy.ndarray' object has no attribute 'split' – mbc Jan 20 '18 at 14:28

2 Answers2

12

Add.

from pytesseract import Output
Andrey Abramov
  • 131
  • 1
  • 2
5

The problem is you have installed original pytesseract package (downloaded using pip) and referring documentation of madmaze GitHub version, actually both are different.

I suggest uninstalling the present version and cloning the GitHub repo and installing the same, by following this steps:

  1. Uninstall present version:

    pip uninstall pytesseract

  2. Clone madmaze/pytesseract GitHub repo by either using git:

    git clone https://github.com/madmaze/pytesseract.git

    or download it directly by clicking here

  3. Get to the root directory of the cloned repo and run:

    pip install .

skt7
  • 1,197
  • 8
  • 21