1

Helloo,

I am creating a project where I am trying to extract as output the number of colors in an image. I have been able to get the number of colors and their names along with their count but the problem is that it goes through every pixel which is getting different degrees of all the colors even if it is white and black.

So, here is my question how do I extend the degrees so that it shows only white and not the different grades of white and is there a better way to extract the number of colors in an image with their count and percentage?

To summarize: The output is getting too many colors for an image that is 2 colors only (by eyesight), I want to get an outcome that close the basic visible colors only.

I also have included only the colors that are greater than 1%

colors_dict = {..."FE0000": "Red","C71585": "Red-violet"} <------------ huge dictionary of colors this is just one example

img = Image.open("Moustiques_Design_2.JPG")
size = w, h = img.size
data = img.load()

colors = []
for x in range(w):
    for y in range(h):
        color = data[x, y]
        hex_color_lower = ''.join([hex(c)[2:].rjust(2, '0') for c in color])
        hex_color = hex_color_lower.upper()
        colors.append(hex_color)

total = w * h

color_hex = []
color_count = []
color_percent = []

df = pd.DataFrame()
for color, count in Counter(colors).items():
    percent = count/total * \
        100  # Do not make it int. Majority of colors are < 1%, unless you want >= 1%
    if percent > 1:
        color_hex.append(color)
        color_count.append(count)
        color_percent.append(percent)

df['color'] = color_hex
df['count'] = color_count
df['percent'] = color_percent
df['color_name'] = df['color'].map(colors_dict)

df.to_excel(r'C:\Users\User\Desktop\Project\export_dataframe.xlsx',
            index=False, header=True)
print('done')

this is the outcome:

enter image description here

required output

enter image description here

A_K
  • 731
  • 3
  • 15
  • 40
  • 2
    What is your data and what is your desired output? May you please provide them in a clear and reproducible way? – Bill Huang Oct 31 '20 at 23:08
  • 1
    I don't get you, column_name isn't correct, right?. Do you want column_name don't have the different degrees?. Please, could you give an output example? – Carmoreno Nov 01 '20 at 00:00
  • I am trying to add a function to calculate the similarity between the colors like here https://stackoverflow.com/q/5392061/13176726, reduce the outcome for similar color ranges. – A_K Nov 07 '20 at 03:19
  • Please provide sample images and the corresponding expected *"answers"*. You should be aware that different people perceive colours to differing degrees, also that JPEG images blur and mix colours a lot more than PNG images. There are many suggestions for formulae to calculate *"colour difference"* by the way... https://en.wikipedia.org/wiki/Color_difference – Mark Setchell Nov 08 '20 at 16:20
  • If it helps you solve your problem, please consider accepting an answer. Thanks! – jrouquie Nov 12 '20 at 18:43

2 Answers2

1

Ok so I think I've found a solution to your problem, I'm not going to say that it is the most efficient solution, but it should work

What I've done is, firstly, I modified your code a bit, the colours are sorted out first if they are in colors_dict and then by their count, so when a colour not in the colors_dict appears, with a count percentage less than 1%, the program automatically ignores the rest of the colours, so it doesn't iterate through all of the colours

Then I added two functions: color_bounds(color, bound) and check_bounds(bounds, color_hex)

What color_bounds does is it gets the range of colours (using an inputted range) that are similar to the inputted colour, for example with a range of 2, the function would return C93AE0 and C536DC for the colour C738DE

Then check_bounds uses the bounds from the colour and checks if any of the more frequent colours (already used colours) are within the bounds, and if so, it won't add it

import pandas as pd
from PIL import Image
from collections import Counter

def color_bounds(color, bound):
    r, g, b = color[:2], color[2:4], color[4:]
    bounds = int(r, 16), int(g, 16), int(b, 16)

    upper_bounds = []
    lower_bounds = []
    # upper_bounds = ""
    # lower_bounds = ""
    for value in bounds:
        upper = value + bound
        lower = value - bound
        while upper > 255:
            upper -= 1
        while lower < 0:
            lower += 1

        """
        upper = hex(upper).split("x")[-1].upper()
        lower = hex(lower).split("x")[-1].upper()

        if len(upper) == 1:
            upper = "0" + upper
        if len(lower) == 1:
            lower = "0" + lower
        """
        
        upper_bounds.append(upper)
        lower_bounds.append(lower)
        # upper_bounds += upper
        # lower_bounds += lower

    return (upper_bounds, lower_bounds)

def check_bounds(bounds, colors):
    upper_bounds = bounds[0]
    lower_bounds = bounds[1]

    for color in colors:
        r, g, b = color[:2], color[2:4], color[4:]
        bounds = int(r, 16), int(g, 16), int(b, 16)

        similar = [False, False, False]
        for i in range(0, 3):
            if bounds[i] <= upper_bounds[i] and bounds[i] >= lower_bounds[i]:
                similar[i] = True

        if similar[0] and similar[1] and similar[2]:
            return False

    return True

colors_dict = {"000000": "Black", "FFFFFF": "White"} #<------------ huge dictionary of colors this is just one example

img = Image.open("image.jpg")
size = w, h = img.size
data = img.load()

colors = []
for x in range(w):
    for y in range(h):
        color = data[x, y]
        hex_color_lower = ''.join([hex(c)[2:].rjust(2, '0') for c in color])
        hex_color = hex_color_lower.upper()
        colors.append(hex_color)

total = w * h

color_hex = []
color_count = []
color_percent = []

df = pd.DataFrame()
def key(i):
    try:
        color = colors_dict[i[0]]
    except:
        color = ""
    return color, i[1]

colors = Counter(colors).items()
for color, count in sorted(colors, key=key, reverse=True):
    percent = count/total * \
        100  # Do not make it int. Majority of colors are < 1%, unless you want >= 1%
    if percent > 1:

        # New functions to ignore colours that are similar to more frequent colours
        # Make the bound value bigger to include more colours and smaller to include less
        bounds = color_bounds(color, 16)
        if check_bounds(bounds, color_hex):
            color_hex.append(color)
            color_count.append(count)
            color_percent.append(percent)
    else: break

df['color'] = color_hex
df['count'] = color_count
df['percent'] = color_percent
df['color_name'] = df['color'].map(colors_dict)

df.to_excel(r'export_dataframe.xlsx',
            index=False, header=True)

print('done')

Given a little more time I could make the code much more efficient, but as of yet I think I've answered your question, please tell me if this was helpful :D

PS You should be able to adjust the bound in color_bounds to include more or less colours

PPS I left in the code for color_bounds to convert the bounds back to hex, if you want to do that it will just require you to add in a function into check_bounds to re-convert it back into decimal rgb values

Judev1
  • 434
  • 3
  • 11
1

I suggest to use numpy. Something along the lines of:

from PIL import Image
import numpy as np

img = Image.open("Moustiques_Design_2.JPG")
data = np.array(img.getdata()) # data is a numpy array of shape (nb_pixels, 3)

colors = np.array([
    [255,255,255], # white
    [0,0,0], #  black
    [254,0,0], # red
    [199,21,133], # C71585 "Red-violet"
    # etc.
])

# difference of each pixel to each color (thanks to numpy it's fast)
differences = data.reshape((data.shape[0], 1, 3)) - colors

distances = np.linalg.norm(differences, axis=(2)) # distance of each pixel to each color
closest_color_indices = np.argmin(distances, axis=1) # index (in array colors) of the closest color to each pixel
color_counts = collections.Counter(closest_color_indices)

Then color_counts is, for each color, the number of pixels approximately that color. For instance if colors_counts = Counter({0: 36675, 3: 5612, 2: 5864, 1: 2474}), it means that 36675 pixels where closest to white (than to any other colors in your dictionnary), 2474 approximately black, 5864 approximately red, 5612 closest to red-violet, etc. Then you can compute percentages and output as Excel like you already did in your code:

color_names = ['white', 'black', 'red', 'red-violet']
col_hex = []
col_count = []
col_percent = []
col_names = []
total = sum(color_counts.values())

for index, count in color_counts.items():
    print(index, count)
    percent = count/total * \
        100  # Do not make it int. Majority of colors are < 1%, unless you want >= 1%
    if percent > 1:
        rgb = tuple(colors[index])
        col_hex.append('%02x%02x%02x' % (rgb))
        col_count.append(count)
        col_percent.append(percent)
        col_names.append(color_names[index])
df = pd.DataFrame()
df['color'] = col_hex
df['count'] = col_count
df['percent'] = col_percent

df.to_excel(r'C:\Users\User\Desktop\Project\export_dataframe.xlsx',
            index=False, header=True)
print('done')

Each difference between a color and a pixel is a vector of 3 values. You might want to consider another colorspace, like YUV or CIE.

jrouquie
  • 4,315
  • 4
  • 27
  • 43