How to properly invert colours for OCR?

Question

I am currently trying to improve an OCR routine. The text I encounter is white with a varying background. So I'm thinking of changing the perfect white text to black, and everything else to white. Everything works fine, till I need to invert the colours.

The invert method from PIL doesn't support this image mode, so I have to convert, but I get bad results from it.

OSError: not supported for this image mode

My test image is this:

Which I can turn into:

But when I try to convert, invert and convert back, it gets the colours/grayscale again?

So, currently, I can't find a way to get the result I want:

If I use the white text to read the image, I only get "Lampent used BS] gL [LL =e". But it reads perfectly fine with Black text.

What is another way I can invert my image? The only other stuff I found, wants to change every pixel at a time, with no good guidance for beginner coder.

def readimg(image, write=False):
    import pytesseract
    from PIL import Image

    # opening an image from the source path
    if isinstance(image, str):
        img = Image.open(image)
        img = img.convert('RGBA')
    else:
        img = image
    img = img.convert('RGB')  # Worse results if not reconverted??

    img.show()
    # path where the tesseract module is installed
    pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'

    # converts the image to result and saves it into result variable
    result = pytesseract.image_to_string(img)

    # write text in a text file and save it to source path
    if "" in result:  # Catch some Garbage text
        result = result[:-2]
    result = result.strip()  # Clean newlines

    if write:
        with open('output.txt', mode='a') as file:
            file.write(result)

    print(result)
    return result


def improve_img(image):
    from PIL import Image, ImageOps

    if isinstance(image, str):  # if link, open it
        im = Image.open(image)
    else:
        im = image
    im = im.convert('RGBA')

    thresh = 254  # https://stackoverflow.com/questions/9506841/using-python-pil-to-turn-a-rgb-image-into-a-pure-black-and-white-image
    fn = lambda x: 255 if x > thresh else 0
    r = im.convert('L').point(fn, mode='1')

    #r = im.convert('RGB')
    #r = ImageOps.invert(r)
    #r = im.convert('L')
    #r.save("test.png")
    #r.show()

    return r


if __name__ == '__main__':
    test = improve_img('img/testtext1.png')
    readimg(test)

score 1 · Accepted Answer · answered Sep 21 '20 at 07:10

The pitfall is this line:

r = im.convert('L').point(fn, mode='1')

Although not explicitly stated in the documentation on PIL.ImageOps.invert, mode 1 seems not to be supported. Just stick to the common grayscale mode L. Since you're thresholding the image anyway, you only have values 0 and 255, so inverting these is perfectly fine here.

Here's some very shortened code:

from PIL import Image, ImageOps

im = Image.open('path/to/your/image.png')
r = ImageOps.invert(im.convert('L').point(lambda x: 255 if x > 254 else 0))
r.save('test.png')

Some visualization code for testing:

from matplotlib import pyplot as plt

plt.figure(0, figsize=(16, 4))
plt.subplot(121), plt.imshow(im), plt.title('Original image')
plt.subplot(122), plt.imshow(r, cmap='gray'), plt.title('Modified image')
plt.tight_layout()
plt.show()

----------------------------------------
System information
----------------------------------------
Platform:     Windows-10-10.0.16299-SP0
Python:       3.8.5
Matplotlib:   3.3.1
Pillow:       7.2.0
----------------------------------------

How to properly invert colours for OCR?

1 Answers1