3

How to use opencv and pytesseract to extract text from image?

import cv2

import pytesseract from PIL import Image import numpy as np from matplotlib import pyplot as plt

img = Image.open('test.jpg').convert('L')
img.show()
img.save('test','png')
img = cv2.imread('test.png',0)
edges = cv2.Canny(img,100,200)
#contour = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#print pytesseract.image_to_string(Image.open(edges))
print pytesseract.image_to_string(edges)

But this is giving error-

Traceback (most recent call last): File "open.py", line 14, in print pytesseract.image_to_string(edges) File "/home/sroy8091/.local/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 143, in image_to_string if len(image.split()) == 4: AttributeError: 'NoneType' object has no attribute 'split'

sumitroy
  • 448
  • 9
  • 20

2 Answers2

8

If you like to do some pre-processing using opencv (like you did some edge detection) and later on if you wantto extract text, you can use this command,

# All the imports and other stuffs goes here
img = cv2.imread('test.png',0)
edges = cv2.Canny(img,100,200)
img_new = Image.fromarray(edges)
text = pytesseract.image_to_string(img_new, lang='eng')
print (text)
Deepan Raj
  • 385
  • 1
  • 5
  • 16
0

You cannot use directly Opencv objects with tesseract methods.

Try:

from PIL import Image
from pytesseract import *

image_file = 'test.png'
print(pytesseract.image_to_string(Image.open(image_file)))