How do I read color of text from image in Python

Question

I am building a project that can read text from images. I also need to determine in which color this text is written. Images are computer generated and are always consist of numbers. I am using PyTesseract for OCR detection. Can anyone suggest to me how can I do it?

Sample Image

Say for example I need information in my python code like 429.05 Green

My code is as bellow

import pytesseract
import cv2

pytesseract.pytesseract.tesseract_cmd = "C:\\Program Files\\Tesseract-OCR\\tesseract.exe"
img = cv2.imread("D:\\test2.png")
text = pytesseract.image_to_string(img)

print(text)

you can use webcolors to get the color. ref https://pypi.org/project/webcolors/1.3/ — AnonyMouze, Aug 02 '20 at 13:43
Thanks AnonyMouze, can you show with python example. Image would only be having numbers and max possible colors are green, red and black and that too one color for one image — Gaurav Shah, Aug 02 '20 at 13:53
Requesting libraries/software is off-topic for StackOverflow. — DisappointedByUnaccountableMod, Aug 02 '20 at 13:57

score 3 · Accepted Answer · answered Aug 02 '20 at 14:11

This could be done with the Pillow library.

First import the required libraries and use the getcolors method to obtain the color pallet, sorting it by pixel count ascending.

from PIL import Image
i = Image.open("D:\\test2.png")

colors = sorted(i.getcolors())

For your image colors is now a list of tuples, where the first item in each tuple is the number of pixels containing said colour, and the second item is another tuple indicating the RGB colour code.

The last item in the list is that with the most pixels (white):

>>> colors[-1]
(2547, (255, 255, 255))

Second last is probably the colour you want:

>>> colors[-2]
(175, (76, 175, 80))

This can then be converted to a hex code:

>>> '#%02x%02x%02x' % colors[-2][1]
'#4caf50'

And quickly confirm with a web-based hex picker:

This looks correct for your test image, but you may need to tweak slightly if the images you are working on vary.

score 1 · Answer 2 · answered Aug 02 '20 at 15:44

Thanks to all for support. I cropped image containing first letter then applied steps as suggested by @v25. Bellow is code.

import pytesseract
from PIL import Image


pytesseract.pytesseract.tesseract_cmd = "C:\\Program Files\\Tesseract-OCR\\tesseract.exe"
img = Image.open("D:\\test1.png")

text = pytesseract.image_to_boxes(img).split(" ")
(left, upper, right, lower) = (int(text[1]),int(text[2])-8,int(text[3]),int(text[4])+8)
im_crop = img.crop((left, upper, right, lower))
colors = sorted(im_crop.getcolors())
hex = ('#%02x%02x%02x' % colors[-2][1])
color = None
if (hex == '#91949a'):
    color = "Black"
elif ( hex == '#4caf50'):
    color = "Green"
elif ( hex == '#ff9d9d'):
color= "Red"
number = pytesseract.image_to_string(img)
print("Number is: "+number+" Color is: "+color)

How do I read color of text from image in Python

2 Answers2