I have a task where I need to detect whether documents (pdf/word) are in colour or not. As I don't have access to data I am working on the pretense that these documents will need to be converted to jpeg for processing.
The initial direction I was provided was to use a pre-trained image classification model but I felt this was overkill and have been investigating whether inspecting the image pixels instead, I assumed this would be a quicker and less compute intensive. My idea was to classify images as grayscale/bw and anything that did not meet this criteria would be considered colour.
From the articles I have read I understand a grayscale image is one that has a single channel or if it is 3 channel R==G==B for all pixels and produced code based off this thread however when inspecting my images at times images that the layman would consider as 'not colour' did not have equal R, G, B values. Example images can be seen below:
- https://www.pexels.com/photo/monochrome-photo-of-high-rise-buildings-2539658/
- https://unsplash.com/photos/--AS0fm7E88
- https://unsplash.com/photos/I6Rlq3H_ca0
- https://www.pexels.com/photo/grayscale-photo-of-building-2817869/
What I typically found for these cases was that the R would be around 10 away from G and B and G and B would be equal or very close thus implemented arbitrary tolerances for R-G and B-G of 10 and 3 respectively after inspecting the RGB values for some of these images, just looking at these colours they do still tend to be shades of grey. I feel my approach is not necessarily the most robust so was looking gain a better understanding of this scenario.
My questions are:
- Is there a more robust way to deal with such cases other than hardcoded, arbitrary tolerance values or if this is a valid approach are there any articles, papers that could help me justify this?
- Why is it that these images don't have R=G=B, I can understand slight noise (+/-3) but unsure why R is so much further away from G and B?
As explained I tried implementing code based on the linked article. Was expecting to see R=G=B for grayscale images but not always the case it seems.