59

So I'm designing a few programs for editing photos in python using PIL and one of them was converting an image to greyscale (I'm avoiding the use of any functions from PIL).

The algorithm I've employed is simple: for each pixel (colour-depth is 24), I've calculated the average of the R, G and B values and set the RGB values to this average.

My program was producing greyscale images which seemed accurate, but I was wondering if I'd employed the correct algorithm, and I came across this answer to a question, where it seems that the 'correct' algorithm is to calculate 0.299 R + 0.587 G + 0.114 B.

I decided to compare my program to this algorithm. I generated a greyscale image using my program and another one (using the same input) from a website online (the top Google result for 'image to grayscale'.

To my naked eye, it seemed that they were exactly the same, and if there was any variation, I couldn't see it. However, I decided to use this website (top Google result for 'compare two images online') to compare my greyscale images. It turned out that deep in the pixels, they had slight variations, but none which were perceivable to the human eye at a first glance (differences can be spotted, but usually only when the images are laid upon each other or switched between within milliseconds).

My Questions (the first is the main question):

  1. Are there any disadvantages to using my 'rough' greyscale algorithm?
  2. Does anyone have any input images where my greyscale algorithm would produce a visibly different image to the one that would be 'correct' ?
  3. Are there any colours/RBG combinations for which my algorithm won't work as well?

My key piece of code (if needed):

def greyScale(pixelTuple):
    return tuple([round(sum(pixelTuple) / 3)] * 3)

The 'correct' algorithm (which seems to heavily weight green):

def greyScale(pixelTuple):
    return tuple([round(0.299 * pixelTuple[0] + 0.587 * pixelTuple[1] + 0.114 * pixelTuple[2])] * 3)

My input image: My input image

The greyscale image my algorithm produces: The greyscale image my algorithm produces

The greyscale image which is 'correct': The greyscale image which is 'correct'

When the greyscale images are compared online (highlighted red are the differences, using a fuzz of 10%): When the greyscale images are compared online (highlighted red are the differences, using a fuzz of 10%)

Despite the variations in pixels highlighted above, the greyscale images above appear as nearly the exact same (at least, to me).

Also, regarding my first question, if anyone's interested, this site has done some analysis on different algorithms for conversions to greyscale and also has some custom algorithms.

EDIT:

In response to @Szulat's answer, my algorithm actually produces this image instead (ignore the bad cropping, the original image had three circles but I only needed the first one):

This is what my algorithm **actually** produces

In case people are wondering what the reason for converting to greyscale is (as it seems that the algorithm depends on the purpose), I'm just making some simple photo editing tools in python so that I can have a mini-Photoshop and don't need to rely on the Internet to apply filters and effects.

Reason for Bounty: Different answers here are covering different things, which are all relevant and helpful. This makes it quite difficult to choose which answer to accept. I've started a bounty because I like a few answers listed here, but also because it'd be nice to have a single answer which covers everything I need for this question.

Adi219
  • 4,712
  • 2
  • 20
  • 43
  • You should notice that the differences occur where the input image is very green, because the *"correct"* formula is weighted towards green. – Mark Setchell Aug 13 '18 at 08:57
  • Also check out that the way to grayscale has heavy influence on the aesthetic: https://photo.stackexchange.com/questions/86599/which-color-filter-do-i-use-for-a-black-white-portrait – Framester Aug 13 '18 at 09:00
  • To get the correct formula also check out: https://stackoverflow.com/questions/596216/formula-to-determine-brightness-of-rgb-color – Framester Aug 13 '18 at 09:01
  • More detail is preserved in the bird's feathers in, as you say, the 'correct' algorithm – Coloured Panda Aug 13 '18 at 11:35
  • 2
    "but none which were perceivable to the human eye": I put the two images in two Firefox tabs and used (Shift+)Ctrl+Tab to switch between them. To my eyes, the difference is *very* large; in fact, it is impossible not to see it. But I do agree that none of the options is 'obviously' better than the other, and -- of course -- the adjective 'better' is highly subjective and/or dependent on your particular application. – Andreas Rejbrand Aug 13 '18 at 11:44
  • Why even approximate? Yes, converting to grayscale pixel-by-pixel in Python with PIL is *slow*, but if you use the `matrix` parameter of `Image.convert` instead, the transform is done in C (not Python) and is considerably faster. – Mr. Llama Aug 13 '18 at 20:33
  • You will notice the difference if you try to convert something with a large green area next to a large blue area. – user253751 Aug 14 '18 at 01:41
  • It is trivial to produce an image that becomes uniformly gray with one of the mappings but not the other, so I am not sure how to interpret point 2. – Carsten S Aug 14 '18 at 08:27
  • @MarkSetchell Yeah I noticed a difference when I converted a picture of a grassy cliff to greyscale. – Adi219 Aug 22 '18 at 12:42
  • @Framester Thanks, that's quite useful :) – Adi219 Aug 22 '18 at 12:42
  • @Kaspars I don't quite see what you claim... – Adi219 Aug 22 '18 at 12:42
  • @AndreasRejbrand I meant as in when you put them next to each other, of course if you switched tabs or used a `.gif` (as in the accepted answer), then you're bound to notice differences. – Adi219 Aug 22 '18 at 12:43
  • @Mr.Llama I'm only using `PIL` to open/close/save the files, I'm using pure `python` for the editing algorithms. – Adi219 Aug 22 '18 at 12:44
  • @immibis Thanks, I noticed the difference when I converted a picture of a grassy cliff to greyscale. – Adi219 Aug 22 '18 at 12:44
  • @CarstenS Well, I am not sure how to interpret your comment simply because it's not trivial for me to produce an image with your description! – Adi219 Aug 22 '18 at 12:45
  • @Adi219, fair enough. Take any image uses only the RGB colours (1,0,0), (0,1,0),(0,0,1), for example. (You could even add (0.5, 0.5, 0), (0.5, 0, 0.5), (0, 0.5., 0.5), (0.33, 0.33, 0.34) to have more colours to choose from).) That is covered by szulat's answer, though, I think. – Carsten S Aug 22 '18 at 15:00

8 Answers8

47

The images look pretty similar, but your eye can tell the difference, specially if you put one in place of the other:

enter image description here

For example, you can note that the flowers in the background look brighter in the averaging conversion.

It is not that there is anything intrinsically "bad" about averaging the three channels. The reason for that formula is that we do not perceive red, green and blue equally, so their contributions to the intensities in a grayscale image shouldn't be the same; since we perceive green more intensely, green pixels should look brighter on grayscale. However, as commented by Mark there is no unique perfect conversion to grayscale, since we see in color, and in any case everyone's vision is slightly different, so any formula will just try to make an approximation so pixel intensities feel "right" for most people.

jdehesa
  • 58,456
  • 7
  • 77
  • 121
  • I haven't awarded the bounty to you as I feel that the number of upvotes on your answer is enough :) I've accepted however, as I truly feel that your answer's quite nice, and the `.gif` only has a positive effect :) Thanks! – Adi219 Aug 22 '18 at 12:26
  • 2
    @Adi219 No problem, the number of upvotes is indeed way more than I was expecting. Never underestimate bird gifs I guess. Thank you for accepting the answer. – jdehesa Aug 22 '18 at 12:30
  • 2
    Indeed, I quite liked the bird `.gif` (and I feel loads of others did too :) ). I was also quite surprised by the amount of upvotes my question received as well as the number of views :) No problem! I also respect this answer as you've referred to @MarkSetchell's answer which I feel is quite nice, especially since his answer was the first and I was initially going to accept his answer back on the day on which I posted this question. I appreciate this! :) – Adi219 Aug 22 '18 at 12:36
42

The most obvious example:

  1. Original

  2. Desaturated in Gimp (Lightness mode - this is what your algorithm does)

  3. Desaturated in Gimp (Luminosity mode - this is what our eyes do)

gimp desaturate: lightness vs luminosity

So, don't average RGB. Averaging RGB is simply wrong!

(Okay, you're right, averaging might be valid in some obscure applications, even though it has no physical or physiological meaning when RGB values are treated as color. By the way, the "regular" way of doing weighted averaging is also incorrect in a more subtle way because of gamma. sRGB should be first linearized and then the final result converted back to sRGB (which would be equivalent of retrieving the L component in the Lab color space))

szulat
  • 543
  • 4
  • 6
  • 24
    +1 for the very illustrative image. -1 for "averaging RGB is simply wrong", since it entirely depends on the current application. – Andreas Rejbrand Aug 13 '18 at 16:42
  • https://en.wikipedia.org/wiki/File:7bit-each.svg another good demonstration image of eye sensitivity to rgb – qwr Aug 14 '18 at 04:36
  • 3
    @AndreasRejbrand Despite the number of upvotes this answer has, no upvotes from me, as the illustrative image **isn't what my algorithm produces** (see my edit to my question) , despite this answer's claims that my algorithm has the same effect as Desaturated in Gimp in Lightness mode. – Adi219 Aug 14 '18 at 07:54
  • 1
    Despite the number of upvotes this answer has received, I'm not awarding this answer **anything** due to the technical inaccuracies present, even after they were pointed out days before. – Adi219 Aug 22 '18 at 12:27
  • 1
    yeah, the "crowd wisdom" is sometimes disappointing... of course the picture is incorrect and i still don't have time to update it :-( (although i believe it does not change the conclusion) – szulat Aug 22 '18 at 13:13
20

You can use any conversion equation, scale, linearity. The one you found:

I = 0.299 R + 0.587 G + 0.114 B

is based on average human eye "average" primary color (R,G,B) perception sensitivity (at least for the time period and population/HW it was created on; bear in mind those standards were created before LED,TFT, etc. screens).

There are several problems you are fighting against:

  1. our eyes are not the same

    All humans do not perceive color the same way. There are major discrepancies between genders and smaller also between regions; even generation and age play a role. So even an average should be handled as "average".

    We have different sensitivity to intensity of light across the visible spectrum. The most sensitive color is green (hence the highest weight on it). But the XYZ curve peaks can be at different wavelengths for different people (like me I got them shifted a bit causing difference in recognition of certain wavelengths like some shades of Aqua - some see them as green some as blue even if none of them have any color blindness disabilities or whatever).

  2. monitors do not use the same wavelengths nor spectral dispersion

    So if you take 2 different monitors, they might use slightly different wavelengths for R, G, B or even different widths of the spectral filter (just use a spectroscope and see). Yes they should be "normalized" by the HW but that is not the same as using normalized wavelengths. It is similar to problems using RGB vs. White Noise spectrum light sources.

  3. monitor linearity

    Humans do not see on a linear scale: we are usually logarithmic/exponential (depends how you look at it) so yes we can normalize that with HW (or even SW) but the problem is if we linearize for one human then means we damage it for another.

If you take all this together you can either use averages ... or special (and expensive) equipment to measure/normalize against some standard or against a calibrated person (depends on the industry).

But that is too much to handle in home conditions so leave all that for industry and use the weights for "average" like most of the world... Luckily our brain can handle it as you cannot see the difference unless you start comparing both images side by side or in an animation :). So I (would) do:

I = 0.299 R + 0.587 G + 0.114 B
R = I
G = I
B = I
psmears
  • 26,070
  • 4
  • 40
  • 48
Spektre
  • 49,595
  • 11
  • 110
  • 380
  • 1
    I've awarded this answer the bounty as I feel that it deserves more upvotes and the answer itself is actually quite detailed, so it's quite nice :) – Adi219 Aug 22 '18 at 12:28
  • 1
    @Adi219 thx ... the lack of votes is most likely due to absence of images as I did not want to add redundant images other answers already got nor copy the images from linked QAs ... PS there are also applications where `I= R+G+B` is needed like this [convert RGB pixel to wavelength](https://stackoverflow.com/a/42542687/2521214) but those are usually only for special reasons/tasks/HW ... – Spektre Aug 22 '18 at 15:51
  • I thought it was just due to the day you answered, but maybe you're right. But yeah, I understand your decision. Thanks! – Adi219 Aug 23 '18 at 08:53
8

There are many different methods for converting to greyscale, and they do give different results though the differences might be easier to see with different input colour images.

As we don't really see in greyscale, the "best" method is somewhat dependent on the application and somewhat in the eye of the beholder.

The alternative formula you refer to is based on the human eye being more sensitive to variations in green tones and therefore giving them a bigger weighting - similarly to a Bayer array in a camera where there are 2 green pixels for each red and blue one. Wiki - Bayer array

CJ Dennis
  • 4,226
  • 2
  • 40
  • 69
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • I know you were the first to answer, and I quite liked your answer, but simply put, the other answers just have more detail. Sorry :( – Adi219 Aug 22 '18 at 12:24
  • 2
    That's cool - you are at liberty to choose whichever answer you prefer - no complaints from me! Like other responders, I was just trying to help. Good luck with your project! – Mark Setchell Aug 22 '18 at 12:27
  • Thanks for understanding :) Also, thanks for the answer! Nearly all the answers here have helped me tremendously; if all goes well, I'll put all of my photo-editing programs on Github for others to use. Thanks! :) – Adi219 Aug 22 '18 at 12:33
8

There are many formulas for the Luminance, depending on the R,G,B color primaries:

Rec.601/NTSC: Y = 0.299*R + 0.587*G + 0.114*B , 

Rec.709/EBU:  Y = 0.213*R + 0.715*G + 0.072*B , 

Rec.2020/UHD: Y = 0.263*R + 0.678*G + 0.059*B . 

This is all because our eyes are less sensitive to blue than to red than to green.

That being said, you are probably calculating Luma, not Luminance, so the formulas are all wrong anyway. For Constant-Luminance you must convert to linear-light

R = R' ^ 2.4 , G = G' ^ 2.4 , B = B' ^ 2.4 , 

apply the Luminance formula, and convert back to the gamma domain

Y' = Y ^ (1/2.4) . 

Also, consider that converting a 3D color space to a 1D quantity loses 2/3 of the information, which can bite you in the next processing steps. Depending on the problem, sometimes a different formula is better, like V = MAX(R,G,B) (from HSV color space).

How do I know? I'm a follower and friend of Dr. Poynton.

StessenJ
  • 181
  • 1
  • Sorry, but this answer makes little sense to me as a non-expert in colour theory. – Adi219 Aug 22 '18 at 12:31
  • Dr. Poynton is Charles Poynton, recently promoted to PhD. He has written several authoritative books in the field of video processing. Recommended reading. – StessenJ Aug 26 '18 at 05:51
  • One of the things learned in video processing is that color space conversion must be done on linear-light signals. The compression done by the OETF, e.g. the gamma function, must first be undone, as in my example. If not, then the greyscale values for red and blue will be too low, too dark. This shows the "Constant Luminance Error" of (Y',Cb,Cr) signals, i.e. Cb,Cr carry some of the Luminance too, for red and blue. – StessenJ Aug 26 '18 at 05:55
5

The answers provided are enough, but I want to discuss a bit more on this topic in a different manner.

Since I learnt digital painting for interest, more often I use HSV.

It is much more controllable for using HSV during painting, but keep it short, the main point is the S: Saturation separating the concept of color from the light. And turning S to 0, is already the 'computer' grey scale of image.

from PIL import Image
import colorsys

def togrey(img):
    if isinstance(img,Image.Image):
        r,g,b = img.split()
        R = []
        G = []
        B = [] 
        for rd,gn,bl in zip(r.getdata(),g.getdata(),b.getdata()) :
            h,s,v = colorsys.rgb_to_hsv(rd/255.,gn/255.,bl/255.)
            s = 0
            _r,_g,_b = colorsys.hsv_to_rgb(h,s,v)
            R.append(int(_r*255.))
            G.append(int(_g*255.))
            B.append(int(_b*255.))
        r.putdata(R)
        g.putdata(G)
        b.putdata(B)
        return Image.merge('RGB',(r,g,b))
    else:
        return None

a = Image.open('../a.jpg')
b = togrey(a)
b.save('../b.jpg')

This method truly reserved the 'bright' of original color. However, without considering how human eye process the data.

MT-FreeHK
  • 2,462
  • 1
  • 13
  • 29
  • I think HSV has nothing to do with considering the human eye. If you look at the [colorsys conversions](https://github.com/kwlzn/python-sources/blob/master/Python-3.2.2/Lib/colorsys.py#L116) you can see that `rgb`->`hsv` sets `v` to `max(r, g, b)`, and converting `hsv`->`rgb` returns `(v, v, v)` when `s` == `0`. So, there's no magic -- just a different solution. [Here's the image](https://i.imgur.com/6SrTX56.png) `grey = max(r, g, b)` produces. – Alistair Carscadden Aug 16 '18 at 09:59
  • More off topic, I really like the `grey = max(r, g, b)` image. Great contrast, bright bird. – Alistair Carscadden Aug 16 '18 at 10:00
  • @AlistairCarscadden , yes, as I stated at the end, this method is not considering how human eye process the data – MT-FreeHK Aug 16 '18 at 10:03
  • And I commented because I don't agree, I think the method does consider how we see light. Something bright blue is bright, something bright green is bright, and something bright red is bright. So, max(r, g, b) considers that entirely. – Alistair Carscadden Aug 16 '18 at 10:08
  • @AlistairCarscadden, not really, consider a photo taken under long exposure, you can recognize the color but losing most of detail. In this case, the pic using HSV method will definitely worst. So, in fact that's actually a contrast map. – MT-FreeHK Aug 16 '18 at 12:25
  • This illustrates how any 1D greyscale value loses 2/3 of the information in the 3D color space. V = MAX(R,G,B) is a great value for *some* purposes, particularly for quantifying the display drive. If V=1 then the display can not be driven harder, for all colors. A Luminance value greatly under-estimates the display drive for red and blue. Picking a good 1D greyscale value is critical for HDR-SDR tone mapping. – StessenJ Aug 26 '18 at 06:01
1

In answer to your main question, there are disadvantages in using any single measure of grey. It depends on what you want from your image. For example, if you have colored text on white background, if you want to make the text stand out you can use the minimum of the r, g, b values as your measure. But if you have black text on a colored background, you can use the maximum of the values for the same result. In my software I offer the option of max, min or median value for the user to choose. The results on continuous tone images are also illuminating. In response to comments asking for more details, the code for a pixel is below (without any defensive measures).

int Ind0[3] = {0, 1, 2};                 //all equal
int Ind1[3] = {2, 1, 0};                 // top, mid ,bot from mask...
int Ind2[3] = {1, 0, 2};
int Ind3[3] = {1, 2, 0};
int Ind4[3] = {0, 2, 1};
int Ind5[3] = {2, 0, 1};
int Ind6[3] = {0, 1, 2};
int Ind7[3] = {-1, -1, -1};              // not possible
int *Inds[8] = {Ind0, Ind1, Ind2, Ind3, Ind4, Ind5, Ind6, Ind7};
void grecolor(unsigned char *rgb, int bri, unsigned char *grey)
{                         //pick out bot, mid or top according to bri flag
    int r = rgb[0];
    int g = rgb[1];
    int b = rgb[2];
    int mask = 0;
    mask |= (r > g);
    mask <<= 1;
    mask |= (g > b);
    mask <<= 1;
    mask |= (b > r);
    grey[0] = rgb[Inds[mask][2 - bri]];  // 2, 1, 0 give bot, mid, top
}
Steve J
  • 59
  • 4
  • This is just an attempt at indirectly self-promoting your own software. – Adi219 Aug 22 '18 at 12:29
  • How can that be, if I have not identified the software? Actually I am pointing out that the discussion so far has focussed on correctness, but often it is user choice that is important. – Steve J Aug 23 '18 at 13:08
  • You literally mention two use cases which use different methods, then state that your software allows you to choose which method you want. That's quite clearly indirect self-promotion, as anybody who wants to find out more is essentially going to be asking for your software, as there are no other significant points in your answer. – Adi219 Aug 23 '18 at 13:16
  • Sorry, I thought I had given enough information for people to implement the procedure. I have edited my post to include the code. – Steve J Aug 24 '18 at 16:42
-2

Using a rough greyscale algorithm can result in several problems. Firstly, the output may lack smooth transitions between shades of grey, leading to a visually jagged or pixelated appearance. This can make the image look unnatural and lose details. Secondly, the algorithm may not accurately represent the original image's tonal values, causing a loss of contrast and subtle variations in brightness. Additionally, a rough greyscale algorithm might not take into account color perception, leading to inaccurate conversions. Lastly, the algorithm may not consider the specific characteristics of the input image, resulting in a generic or one-size-fits-all approach that fails to capture the nuances of individual images.

Limited Dynamic Range: A rough greyscale algorithm might not properly capture the full range of tones in an image. This can result in loss of detail in the highlights and shadows, leading to a flat or washed-out appearance.

Loss of Texture and Detail: The algorithm's lack of precision can cause the loss of fine textures and intricate details in the image. This can impact the overall quality and visual fidelity of the greyscale conversion.

Inconsistent Results: Different images may require different adjustments and optimizations to achieve a satisfactory greyscale representation. A rough algorithm may not account for these variations, leading to inconsistent and suboptimal results across different images.

Artifacts and Noise: The algorithm's simplifications and approximations can introduce artifacts and noise into the greyscale output. These artifacts can manifest as unwanted patterns, irregularities, or pixel distortions, degrading the image quality.

Insensitivity to Image Content: A rough greyscale algorithm may not consider the specific content of the image, such as the subject matter or composition. As a result, important elements may not be adequately highlighted or differentiated, affecting the overall visual impact.We can also observe in tools like color picker from image & color wheels.

Lack of User Control: Users may have limited control over the algorithm's parameters and adjustments. This can restrict their ability to fine-tune the greyscale conversion according to their artistic vision or specific requirements.

It's worth noting that the severity of these problems can vary depending on the specific algorithm being used and the quality of implementation. However, these are some common challenges associated with rough greyscale algorithms.