2

I have a task where I need to detect whether documents (pdf/word) are in colour or not. As I don't have access to data I am working on the pretense that these documents will need to be converted to jpeg for processing.

The initial direction I was provided was to use a pre-trained image classification model but I felt this was overkill and have been investigating whether inspecting the image pixels instead, I assumed this would be a quicker and less compute intensive. My idea was to classify images as grayscale/bw and anything that did not meet this criteria would be considered colour.

From the articles I have read I understand a grayscale image is one that has a single channel or if it is 3 channel R==G==B for all pixels and produced code based off this thread however when inspecting my images at times images that the layman would consider as 'not colour' did not have equal R, G, B values. Example images can be seen below:

  1. https://www.pexels.com/photo/monochrome-photo-of-high-rise-buildings-2539658/
  2. https://unsplash.com/photos/--AS0fm7E88
  3. https://unsplash.com/photos/I6Rlq3H_ca0
  4. https://www.pexels.com/photo/grayscale-photo-of-building-2817869/

What I typically found for these cases was that the R would be around 10 away from G and B and G and B would be equal or very close thus implemented arbitrary tolerances for R-G and B-G of 10 and 3 respectively after inspecting the RGB values for some of these images, just looking at these colours they do still tend to be shades of grey. I feel my approach is not necessarily the most robust so was looking gain a better understanding of this scenario.

My questions are:

  1. Is there a more robust way to deal with such cases other than hardcoded, arbitrary tolerance values or if this is a valid approach are there any articles, papers that could help me justify this?
  2. Why is it that these images don't have R=G=B, I can understand slight noise (+/-3) but unsure why R is so much further away from G and B?

As explained I tried implementing code based on the linked article. Was expecting to see R=G=B for grayscale images but not always the case it seems.

  • How about transforming to HSV (see https://stackoverflow.com/questions/54970416/is-there-a-way-of-converting-an-image-to-hsl-using-pillow) and defining "non-colour" images as those where the S (Saturation) value doesn't exceed some threshold? (You might want to try a few different approaches - e.g. do you care about the average saturation, or the maximum, etc) – slothrop Apr 21 '23 at 10:08
  • Are there any values that values of S that are typically used for classifying "non-colour" or grayscale images? The rule of using R==G==B for greyscale is why RGB seemed like a good choice and using the tolerances could account for the noise so that rule may still stand true (Idk for sure). Also I had a play with some colour pickers but I don't believe fixing S alone would be sufficient as there are some shades of black with 100% saturation which may be neglected. Imagine it would have to be a series of ranges but would be good to know if there is a standard here too – qwertuestions Apr 21 '23 at 12:53
  • Good question - I wouldn't call myself an expert on this, and ultimately it's somewhat subjective, but there's another answer here which is interesting: https://stackoverflow.com/a/56571378/765091 – slothrop Apr 21 '23 at 13:48
  • Personally, I would do my very best to avoid JPEG when looking at colour. It does *"chroma subsampling"* which makes an almighty mess if you are looking at colour analysis. – Mark Setchell Apr 22 '23 at 10:14

0 Answers0