How to change pixel colours in an image using numpy

Question

I have an image that is converted to a numpy array

np_image = np.array(Image.open(filename))

I am attempting OCR using pytesseract, but the ocr fails when the text is red or yellow and so I want all text to be black.

I am breaking down the image into snippets as I know where the individual text elements appear

As you can see from the code below, I have attempted to find coloured pixels and and convert them to black, but it does not work

snippet = np_image[top: bottom, field.left: field.right].copy()
for row in snippet:
    for pixel in row:
        test = [rgb for rgb in pixel[:-1]]
        test.sort()
        if test[0] > 190 and test[2] < 100:
            pixel = [0, 0, 0, 255]

text = pytesseract.image_to_string(snippet)

What should I do?

`test.sort` should do nothing, the `()` are missing – Jérôme Richard Jan 09 '22 at 15:03 — Jérôme Richard, Jan 09 '22 at 15:03
@JérômeRichard ta. Still doesn't fix it :) – Psionman Jan 09 '22 at 15:10 — Psionman, Jan 09 '22 at 15:10

score 1 · Accepted Answer · answered Jan 09 '22 at 15:21

1

pixel = [0, 0, 0, 255] does not write in the actual image. It just set the value [0, 0, 0, 255] to the local variable pixel. Assuming np_image is a Numpy array, you certainly need to write pixel[:] = [0, 0, 0, 255] so that np_image can be modified.

Moreover, the sort function sorts the values in an increasing order. Thus, the condition seems suspiciously wrong. Indeed, test[0] <= test[2] must always be true. Thus, if test[0] > 190 is true, then test[2] < 100 cannot be true. As a result, the condition should never be true.

answered Jan 09 '22 at 15:21

Jérôme Richard

41,678
6
29
59

For anyone looking for an answer to a similar question, I now realise that this is the WRONG approach to improving accuracy in pytesseract OCR. [See this SO question and answers](https://stackoverflow.com/questions/9480013/image-processing-to-improve-tesseract-ocr-accuracy) – Psionman Jan 10 '22 at 12:34

How to change pixel colours in an image using numpy

1 Answers1