I want to ask for suggestion on how to solve the problem of tesserocr did not recognize certain line from an image.
This is the image. source is from Simple Digit Recognition OCR in OpenCV-Python
The code
from PIL import Image
from tesserocr import PyTessBaseAPI, RIL
image = Image.open('test3.png')
with PyTessBaseAPI() as api:
api.SetImage(image)
boxes = api.GetComponentImages(RIL.TEXTLINE, True)
print 'Found {} textline image components.'.format(len(boxes))
for i, (im, box, _, _) in enumerate(boxes):
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
result = (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
"confidence: {1}, text: {2}").format(i, conf, ocrResult, **box)
print result
The result is like this
Found 5 textline image components.
Box[0]: x=10, y=5, w=582, h=29, confidence: 81, text: 9821480865132823066470938
Box[1]: x=9, y=55, w=581, h=30, confidence: 91, text: 4460955058223172535940812
Box[2]: x=10, y=106, w=575, h=30, confidence: 90, text: 8481117450284102701938521
Box[3]: x=12, y=157, w=580, h=30, confidence: 0, text:
Box[4]: x=11, y=208, w=581, h=30, confidence: 89, text: 6442881097566593344612847
It did not recognize the number in box 3. What should I add or modify the script so the box 3 will show the proper result?
Thank you for your help.