Detecting anti-aliased or undersampled text image

Question

I have an image that is essentially a text document (black and white) but due to anti-aliasing/undersampling applied during scanning, the image contains a lot of color, light tone pixels and is thus saved as a full-color image i.e: takes a lot of space.

My goal is to be able to detect Black and White image candidates in order to convert them from full color to B&W which dramatically reduces their size.

Is there a way to detect such anti-aliased/undersampled images? Doing color pixel analysis doesn't help because the colored pixels end up being close in amount to the black pixels... Essentially I want to be able to detect that the colored pixels come from anti-aliasing/undersampling a black & white image and not from a picture type image.

Here is an example image:

As you can see there are many more colors than just black. However this image is a good candidate for Black & White / Greyscale conversion instead of full color. How can I detect such images? Please note that in this example the colors tend to be on the grey side but there are many cases where they are cyan or brown etc.

scanning documents does not involve anti-aliasing. the colour effects you describe result from undersampling and chromatic aberrations — Piglet, Apr 12 '17 at 08:51
I still don't get why you cannot simply convert the image to grayscale — Piglet, Apr 12 '17 at 08:59
This is not a conversion problem. I want to *detect* if the image is a Text Image, thus rendering all the color pixels (due to undersampling) redundant and *then* decide to convert it. It *could* be a picture type image and then i don't want to convert it to black and white. — PentaKon, Apr 12 '17 at 09:14
Possible duplicate of [Algorithm to detect presence of text on image](http://stackoverflow.com/questions/4606274/algorithm-to-detect-presence-of-text-on-image) — Piglet, Apr 12 '17 at 09:21
I don't care about *presence* of text. I want to understand if it is text *only* and if the color pixels come from text undersampling/anti-aliasing — PentaKon, Apr 12 '17 at 09:35
may I quote: "I want to detect if the image is a Text Image"... it should be fairly easy to classify pixels by looking at their neighbourhood. does the pixel have a lot of black and white neighbours? are those pixels grouped in a rectangular shape? is that shape large enough to be a picture?... I suggest you provide a few sample images to stop the guess work. also your question will be downvoted and closed if you do not provide a few own ideas or attempts to solve the problem. — Piglet, Apr 12 '17 at 09:54
Please provide some images else the question is far too broad. — Mark Setchell, Apr 12 '17 at 10:17

repo · Answer 1 · 2022-03-05T01:44:40.333

I think it is a valid question. I don't have 50 reputation to post a comment so I will post this as an answer.

Basically, in a black and white anti-aliased image the various grey colors are opacity differences of the black color. If we observe those pixels they will be like these listed below. So, if the operation is a color manipulation then apply the same opacity picked up from those grey pixels to the new color.

rgba(0,0,0,0.6)
rgba(0,0,0,0.9)
rgba(0,0,0,0.5)
rgba(0,0,0,0.9)
rgba(0,0,0,0.6)
rgba(0,0,0,0.1)
rgba(0,0,0,0.5)

In my opinion, the pixels other than grey, in this example image, cyan and brown as it appears can be safely ignored because they seemed like not part of the original text. If there were a few more example images of non grey pixels would have been good. But if we cannot ignore them just need to get the pixel opacity and apply the same color manipulation. In other words we treat them as black pixels.

Detecting anti-aliased or undersampled text image

1 Answers1