Tesseract-OCR not reading image re-sized with python pillow, but does with MS Paint

Question

I am new to python and I am using python 3.4.3 and Pillow for reading text from image. Image has white background with text in black.

The problem is when I am re-sizing image("captcha1.png") with MS Paint and using tesseract to read the text, everything works fine. But when I am re-sizing image with python(using pillow, code following), nothing happens.

im1 = Image.open("captcha1.png")
width, height = im1.size
im2 = im1.resize((int(width*5), int(height*5)), Image.ANTIALIAS)
im2.save("captcha2.png", dpi=(600.0,600.0))

I also tried

im2.save("captcha2.png", quality=95)

Using following to parse image:-

subprocess.call(['C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe', 'G:\\python\\captcha2.png', 'out', '-psm', '8'])

If I see image properties of re-sized images by Paint and python, both seem same (i.e. of 96dpi, same dimensions, sizes differ by ~2kb though).

Can someone help?

Also, I did a little bit of research and it seems 96dpi is too small for OCR, but I can't figure out why is everything working fine with image re-sized with paint?

The `ANTIALIAS` method in PIL has a bug when you try to upsize the image, as you're doing here, that results in poor quality output. I don't know if they fixed it in Pillow. P.S. the `quality` setting doesn't do anything for `png` files. — Mark Ransom, May 19 '15 at 16:06
Thanks for replying.So any alternatives you can suggest? I tried Image.BICUBIC, but same story. — user1581539, May 19 '15 at 18:38
You might try [`imagemagick`](http://stackoverflow.com/questions/7895278/can-i-access-imagemagick-api-with-python). — Mark Ransom, May 19 '15 at 18:43

Tesseract-OCR not reading image re-sized with python pillow, but does with MS Paint

0 Answers0