Python - Find dominant/most common color in an image

Question

I'm looking for a way to find the most dominant color/tone in an image using python. Either the average shade or the most common out of RGB will do. I've looked at the Python Imaging library, and could not find anything relating to what I was looking for in their manual, and also briefly at VTK.

I did however find a PHP script which does what I need, here (login required to download). The script seems to resize the image to 150*150, to bring out the dominant colors. However, after that, I am fairly lost. I did consider writing something that would resize the image to a small size then check every other pixel or so for it's image, though I imagine this would be very inefficient (though implementing this idea as a C python module might be an idea).

However, after all of that, I am still stumped. So I turn to you, SO. Is there an easy, efficient way to find the dominant color in an image.

I'm guessing it resizes the picture to let the rescaling algorithm do some of the averaging for you. — Skurmedel, Jul 13 '10 at 22:09

score 91 · Accepted Answer · edited Sep 20 '22 at 01:04

91

Here's code making use of Pillow and Scipy's cluster package.

For simplicity I've hardcoded the filename as "image.jpg". Resizing the image is for speed: if you don't mind the wait, comment out the resize call. When run on this sample image,

it usually says the dominant colour is #d8c865, which corresponds roughly to the bright yellowish area to the lower left of the two peppers. I say "usually" because the clustering algorithm used has a degree of randomness to it. There are various ways you could change this, but for your purposes it may suit well. (Check out the options on the kmeans2() variant if you need deterministic results.)

from __future__ import print_function
import binascii
import struct
from PIL import Image
import numpy as np
import scipy
import scipy.misc
import scipy.cluster

NUM_CLUSTERS = 5

print('reading image')
im = Image.open('image.jpg')
im = im.resize((150, 150))      # optional, to reduce time
ar = np.asarray(im)
shape = ar.shape
ar = ar.reshape(scipy.product(shape[:2]), shape[2]).astype(float)

print('finding clusters')
codes, dist = scipy.cluster.vq.kmeans(ar, NUM_CLUSTERS)
print('cluster centres:\n', codes)

vecs, dist = scipy.cluster.vq.vq(ar, codes)         # assign codes
counts, bins = scipy.histogram(vecs, len(codes))    # count occurrences

index_max = scipy.argmax(counts)                    # find most frequent
peak = codes[index_max]
colour = binascii.hexlify(bytearray(int(c) for c in peak)).decode('ascii')
print('most frequent is %s (#%s)' % (peak, colour))

Note: when I expand the number of clusters to find from 5 to 10 or 15, it frequently gave results that were greenish or bluish. Given the input image, those are reasonable results too... I can't tell which colour is really dominant in that image either, so I don't fault the algorithm!

Also a small bonus: save the reduced-size image with only the N most-frequent colours:

# bonus: save image using only the N most common colours
import imageio
c = ar.copy()
for i, code in enumerate(codes):
    c[scipy.r_[scipy.where(vecs==i)],:] = code
imageio.imwrite('clusters.png', c.reshape(*shape).astype(np.uint8))
print('saved clustered image')

edited Sep 20 '22 at 01:04

MattDMo

100,794
21
241
231

answered Jul 14 '10 at 07:14

Peter Hansen

21,046
5
50
72

3

Wow. That's great. Almost exactly what I was looking for. I did look at scipy, and had a feeling the answer was somewhere in there :P Thank you for your answer. – Blue Peppers Jul 14 '10 at 11:38
Great answer. Worked for me. However, I had a small question. How do I access the second most frequent colour in the case that black is the most frequent and I wish to ignore it? – Frak Dec 24 '15 at 06:54
@frakman1, argmax() is just a convenience function that gives the first. What you'd need to do is sort the counts array (keeping track of the original indices), then pick the second (or second last) entry rather than the first (which is effectively what argmax does). – Peter Hansen Jan 06 '16 at 19:45
I'm getting this error: `File "_vq.pyx", line 342, in scipy.cluster._vq.update_cluster_means TypeError: type other than float or double not supported`. Any idea what this is? – Simon Steinberger Dec 02 '17 at 20:13
@SimonSteinberger With the above code or your own variant? It worked when I posted it, but that was years ago, probably using Python 2.7 and whatever scipy/numpy was reasonably current at the time. – Peter Hansen Dec 03 '17 at 01:06
With the above code on Py 2.7. I just received an answer here, which I need to test now: https://stackoverflow.com/questions/47612243/scipy-kmeans-exits-with-typeerror Possibly something changed in Scipy over the years. – Simon Steinberger Dec 03 '17 at 17:18
`ar = ar.reshape(scipy.product(shape[:2]), shape[2])` needs to be written now as `ar = ar.reshape(scipy.product(shape[:2]), shape[2]).astype(float)`. The `chr(x)` doesn't work then, because an int is expected, but it's now a float. – Simon Steinberger Dec 03 '17 at 17:33
2

I've edited/updated your code. Thanks for this compact and well working solution! – Simon Steinberger Dec 03 '17 at 18:04
2

@SimonSteinberger Thanks for the edit, and I'm happy to hear it's still able to run and help someone 7 years later! It was a fun problem to work on. – Peter Hansen Dec 03 '17 at 18:15
1

this has multiple issues with python 3.x. For example, (1) `.encode('hex')` is [no longer valid syntax](https://stackoverflow.com/a/2340358/2327328), and (2) `from PIL import Image` [source](https://stackoverflow.com/a/51322067/2327328) – philshem May 05 '19 at 12:12
2

Thanks @philshem. I believe I've modified it to support 3.x as well now. Some changes done at the same time resolved deprecations and warnings that were reported on either 2.7 or 3.7 (but not necessarily both). – Peter Hansen May 06 '19 at 14:18
1

What an amazing answer. Works really well for me. But just 2 simple questions: is there any way to visualize `peak` and `colour`, and would you mind explaining the difference between `peak` and `colour` please? Thanks. – Bowen Liu Feb 17 '20 at 04:26
@BowenLiu Colour is just the "hex" form of peak, which is a vector of integer values for red, green, and blue. If peak is `[175, 40, 102]`, then colour would be `"af2866"`. You can visualize it by entering the colour code into a tool like this: https://www.color-hex.com/color/af2866 (use the text box at top to try others). – Peter Hansen Feb 18 '20 at 01:55
@PeterHansen Thanks. Can I visualize with any Python tool or lib like `matplotlib`? I tried to use the `imshow` to do it but to no avail. I am trying to print out the color in the same IPython Notebook. – Bowen Liu Feb 18 '20 at 02:08
@BowenLiu That sounds like something best asked as a new question, since there are many ways to do it, and what works best will depend on details you could provide in your question. Comments here aren't really the best way to do that... – Peter Hansen Feb 19 '20 at 16:35
You are right. I will do that. All the questions on SO are about the other way around, namely getting the color array with a given picture. – Bowen Liu Feb 25 '20 at 21:03
1

For anyone who found this fantastic answer helpful I have an adaptation [here](https://stackoverflow.com/a/64600498/7274182) which is faster and is deterministic – Jacob Dec 29 '21 at 02:10

Artem Bernatskyi · Answer 2 · 2021-03-03T10:30:04.400

57

Try Color-thief. It is based on Pillow and works awesome.

Installation

pip install colorthief

Usage

from colorthief import ColorThief
color_thief = ColorThief('/path/to/imagefile')
# get the dominant color
dominant_color = color_thief.get_color(quality=1)

It can also find color pallete

palette = color_thief.get_palette(color_count=6)

edited Mar 03 '21 at 10:30

answered Aug 26 '18 at 09:01

Artem Bernatskyi

4,185
2
26
35

2

Fantastic module – Trect Jan 13 '20 at 21:26
2

I am wondering if there is a difference in the correcness of the methods in Top voted, accepeted answer and this? I understand the other answer is old and hence might have more votes. – vangap Jan 29 '22 at 11:44

score 22 · Answer 3 · edited Apr 01 '22 at 02:53

You can do this in many different ways. And you don't really need scipy and k-means since internally Pillow already does that for you when you either resize the image or reduce the image to a certain pallete.

Solution 1: resize image down to 1 pixel.

def get_dominant_color(pil_img):
    img = pil_img.copy()
    img = img.convert("RGBA")
    img = img.resize((1, 1), resample=0)
    dominant_color = img.getpixel((0, 0))
    return dominant_color

Solution 2: reduce image colors to a pallete

def get_dominant_color(pil_img, palette_size=16):
    # Resize image to speed up processing
    img = pil_img.copy()
    img.thumbnail((100, 100))

    # Reduce colors (uses k-means internally)
    paletted = img.convert('P', palette=Image.ADAPTIVE, colors=palette_size)

    # Find the color that occurs most often
    palette = paletted.getpalette()
    color_counts = sorted(paletted.getcolors(), reverse=True)
    palette_index = color_counts[0][1]
    dominant_color = palette[palette_index*3:palette_index*3+3]

    return dominant_color

Both solutions give similar results. The latter solution gives you probably more accuracy since we keep the aspect ratio when resizing the image. Also you get more control since you can tweak the pallete_size.

This is also leaps and bounds faster than any of the scikit-learn/scipy images above. — whlteXbread, May 15 '20 at 02:38
Works like a charm, and doesn't require any additional modules. Thank you so much! — RealA10N, Feb 04 '21 at 14:53
Thanks for the piece of code, also could you explain how to know what colour is this (Red, blue...). It would be great if you can provide some piece of code. — Adithya Raj, Apr 04 '22 at 12:56

score 21 · Answer 4 · answered Jul 13 '10 at 23:19

21

Python Imaging Library has method getcolors on Image objects:

im.getcolors() => a list of (count, color) tuples or None

I guess you can still try resizing the image before that and see if it performs any better.

answered Jul 13 '10 at 23:19

zvone

18,045
3
49
77

score 8 · Answer 5 · edited Feb 06 '23 at 10:21

It's not necessary to use k-means to find the dominant color as Peter suggests. This overcomplicates a simple problem. You're also restricting yourself by the amount of clusters you select, so basically you need an idea of what you're looking at.

As you mentioned and as suggested by zvone, a quick solution to find the most common/dominant color is by using the Pillow library. We just need to sort the pixels by their count number.

from PIL import Image

def find_dominant_color(filename):
    #Resizing parameters
    width, height = 150, 150
    image = Image.open(filename)
    image = image.resize((width, height), resample = 0)
    #Get colors from image object
    pixels = image.getcolors(width * height)
    #Sort them by count number(first element of tuple)
    sorted_pixels = sorted(pixels, key=lambda t: t[0])
    #Get the most frequent color
    dominant_color = sorted_pixels[-1][1]
    return dominant_color

The only problem is that the method getcolors() returns None when the amount of colors is more than 256. You can deal with it by resizing the original image.

In all, it might not be the most precise solution, but it gets the job done.

This is not very reliable. (1) you should use `thumbnail` instead of resize to avoid crop or stretch, (2) if you have an image with 2 white pixels and 100 different levels of blackish pixels, you will still get white. — Pithikos, May 11 '20 at 11:27
Agreed but I wanted to avoid the caveat of reducing the granularity when using predefined clusters or a palette. Depending on the use case this might not be desirable. — mobiuscreek, Jul 10 '20 at 15:51
The [resampling filter](https://pillow.readthedocs.io/en/stable/reference/Image.html#resampling-filters) you use might be the cheapest, but the result can be very surprising to users if the picture has really high resolution and is a bit noisy. — Wolf, Feb 06 '23 at 10:26

Tim S · Answer 6 · 2017-01-23T18:59:51.013

6

If you're still looking for an answer, here's what worked for me, albeit not terribly efficient:

from PIL import Image

def compute_average_image_color(img):
    width, height = img.size

    r_total = 0
    g_total = 0
    b_total = 0

    count = 0
    for x in range(0, width):
        for y in range(0, height):
            r, g, b = img.getpixel((x,y))
            r_total += r
            g_total += g
            b_total += b
            count += 1

    return (r_total/count, g_total/count, b_total/count)

img = Image.open('image.png')
#img = img.resize((50,50))  # Small optimization
average_color = compute_average_image_color(img)
print(average_color)

edited Jan 23 '17 at 18:59

answered Nov 08 '15 at 10:03

Tim S

5,023
1
34
34

For png, you need to tweak this slightly to handle the fact that img.getpixel returns r,g,b,a (four values instead of three). Or it did for me anyway. – rossdavidh Sep 26 '16 at 15:28
This weighs pixels unevenly. The final pixel touched contributes half the total value. The pixel before contributes half of that. Only the last 8 pixels will affect the average at all, in fact. – Russell Borogove Jan 22 '17 at 13:52
You're right - silly mistake. Just edited the answer - let me know if that works. – Tim S Jan 23 '17 at 19:00
16

This is not an answer to this question. Average color is not the dominant color in an image. – Phani Rithvij Aug 15 '19 at 13:12

Jacob · Answer 7 · 2022-04-13T15:41:11.347

My solution

Here's my adaptation based on Peter Hansen's solution.

import scipy.cluster
import sklearn.cluster
import numpy
from PIL import Image

def dominant_colors(image):  # PIL image input

    image = image.resize((150, 150))      # optional, to reduce time
    ar = numpy.asarray(image)
    shape = ar.shape
    ar = ar.reshape(numpy.product(shape[:2]), shape[2]).astype(float)

    kmeans = sklearn.cluster.MiniBatchKMeans(
        n_clusters=10,
        init="k-means++",
        max_iter=20,
        random_state=1000
    ).fit(ar)
    codes = kmeans.cluster_centers_

    vecs, _dist = scipy.cluster.vq.vq(ar, codes)         # assign codes
    counts, _bins = numpy.histogram(vecs, len(codes))    # count occurrences

    colors = []
    for index in numpy.argsort(counts)[::-1]:
        colors.append(tuple([int(code) for code in codes[index]]))
    return colors                    # returns colors in order of dominance

What are the differences/improvements?

It's (subjectively) more accurate

It's using the kmeans++ to pick initial cluster centers which gives better results. (kmeans++ may not be the fastest way to pick cluster centers though)

It's faster

Using sklearn.cluster.MiniBatchKMeans is significantly faster and gives very similar colors to the default KMeans algorithm. You can always try the slower sklearn.cluster.KMeans and compare the results and decide whether the tradeoff is worth it.

It's deterministic

I am using a random_state to get consistent ouput (I believe the original scipy.cluster.vq.kmeans also has a seed parameter). Before adding a random state I found that certain inputs could have significantly different outputs.

Benchmarks

I decided to very crudely benchmark a few solutions.

Method	Time (100 iterations)
Peter Hansen (kmeans)	58.85
Artem Bernatskyi (Color Thief)	61.29
Artem Bernatskyi (Color Thief palette)	15.69
Pithikos (PIL resize)	0.11
Pithikos (palette)	1.68
Mine (mini batch kmeans)	6.31

Thanks for adding to the solution Jacob! It's gratifying to see this fun question still helping people out. :) — Peter Hansen, Jan 06 '22 at 22:47

score 4 · Answer 8 · answered Jul 13 '10 at 23:46

You could use PIL to repeatedly resize the image down by a factor of 2 in each dimension until it reaches 1x1. I don't know what algorithm PIL uses for downscaling by large factors, so going directly to 1x1 in a single resize might lose information. It might not be the most efficient, but it will give you the "average" color of the image.

score 4 · Answer 9 · answered Jan 27 '11 at 03:52

4

To add to Peter's answer, if PIL is giving you an image with mode "P" or pretty much any mode that isn't "RGBA", then you need to apply an alpha mask to convert it to RGBA. You can do that pretty easily with:

if im.mode == 'P':
    im.putalpha(0)

answered Jan 27 '11 at 03:52

Samuel Clay

1,252
2
13
24

score 2 · Answer 10 · answered Jan 17 '12 at 07:44

Below is a c++ Qt based example to guess the predominant image color. You can use PyQt and translate the same to Python equivalent.

#include <Qt/QtGui>
#include <Qt/QtCore>
#include <QtGui/QApplication>

int main(int argc, char** argv)
{
    QApplication app(argc, argv);
    QPixmap pixmap("logo.png");
    QImage image = pixmap.toImage();
    QRgb col;
    QMap<QRgb,int> rgbcount;
    QRgb greatest = 0;

    int width = pixmap.width();
    int height = pixmap.height();

    int count = 0;
    for (int i = 0; i < width; ++i)
    {
        for (int j = 0; j < height; ++j)
        {
            col = image.pixel(i, j);
            if (rgbcount.contains(col)) {
                rgbcount[col] = rgbcount[col] + 1;
            }
            else  {
                rgbcount[col] = 1;
            }

            if (rgbcount[col] > count)  {
                greatest = col;
                count = rgbcount[col];
            }

        }
    }
    qDebug() << count << greatest;
    return app.exec();
}

quine9997 · Answer 11 · 2021-07-01T21:52:04.920

This is a complete script with a function compute_average_image_color().

Just copy and past it, and change the path of your image.

My image is img_path='./dir/image001.png'

#AVERANGE COLOR, MIN, MAX, STANDARD DEVIATION
#SELECT ONLY NOT TRANSPARENT COLOR


from PIL import Image
import sys
import os
import os.path
from os import path
import numpy as np
import math 



def compute_average_image_color(img_path):

    if not os.path.isfile(img_path):
        print(path_inp_image, 'DONT EXISTS, EXIT')
        sys.exit()

    
    #load image
    img = Image.open(img_path).convert('RGBA')
    img = img.resize((50,50))  # Small optimization


    #DEFINE SOME VARIABLES
    width, height = img.size
    r_total = 0
    g_total = 0
    b_total = 0
    count = 0
    red_list=[]
    green_list=[]
    blue_list=[]
    
    
    #READ AND CHECK PIXEL BY PIXEL
    for x in range(0, width):
        for y in range(0, height):
            r, g, b, alpha = img.getpixel((x,y))
            
            if alpha !=0:
                red_list.append(r)
                green_list.append(g)
                blue_list.append(b)
            
                r_total += r
                g_total += g
                b_total += b
                count += 1

            
    #CALCULATE THE AVRANGE COLOR, MIN, MAX, ETC             
    average_color=(round(r_total/count), round(g_total/count), round(b_total/count))
    print(average_color)
    
    red_list.sort()
    green_list.sort()
    blue_list.sort()

    
    red_min_max=[]
    green_min_max=[]
    blue_min_max=[]


    
    
    red_min_max.append(min(red_list))
    red_min_max.append(max(red_list))
    green_min_max.append(min(green_list))
    green_min_max.append(max(red_list))
    blue_min_max.append(min(blue_list))
    blue_min_max.append(max(blue_list))
    
    print('red_min_max: ', red_min_max)
    print('green_min_max: ', green_min_max)
    print('blue_min_max: ', blue_min_max)



    #variance and standard devietion
    red_stddev=round(math.sqrt(np.var(red_list)))
    green_stddev=round(math.sqrt(np.var(green_list)))
    blue_stddev=round(math.sqrt(np.var(blue_list)))

    print('red_stddev: ', red_stddev)
    print('green_stddev: ', green_stddev)
    print('blue_stddev: ', blue_stddev)






img_path='./dir/image001.png'
compute_average_image_color(img_path)

Can you explain your code a little bit? (What libraries or modules you used if any and why). It would be nice for others to understand your research, the downsides and upsides of your code and alternatives. It's always better to add some explanation in order to provide context to readers. — Maicon Mauricio, Jul 01 '21 at 21:11
Look better. This is a complete python script. There are 7-8 IMPORT instructions. And every line of code is commented. This is a script, so the user can copy and paste it. You have just change the name of the input image image001.png — quine9997, Jul 01 '21 at 21:31
The question is not about getting the average color, but about getting the dominant one, this script can return a color that doesn't exist at all in the original image (for a simple example, think of the image of a random country flag, and what it'll return, for France it'll be a clear purple, while the dominant color should be one of red, white and blue, as they are equally present). — Tshirtman, Apr 05 '23 at 17:39
@Tshirtman. Read the script code. #CALCULATE THE AVRANGE COLOR, MIN, MAX, ETC rtman — quine9997, Apr 06 '23 at 09:16