How to detect if an image is a photo, clip art or a line drawing?

Question

What is the best way to identify an image's type? rwong's answer on this question suggests that Google segments images into the following groups:

Photo - continuous-tone
Clip art - smooth shading
Line drawing - bitonal

What is the best strategy for classifying an image into one of those groups? I'm currently using Java but any general approaches are welcome.

Thanks!

Update:

I tried the unique colour counting method that tyjkenn mentioned in a comment and it seems to work for about 90% of the cases that I've tried. In particular black and white photos are hard to correctly detect using unique colour count alone.

Getting the image histogram and counting the peeks alone doesn't seem like it will be a viable option. For example this image only has two peaks:

Here are two more images I've checked out:

That's an interesting question. Maybe you could base it off the number of different colors. I assume line drawings would only have two, clip art would have a few more, and a photo would have millions. Then you could just loop through the pixels, counting unique colors, and classify it that way. — tyjkenn, Feb 20 '12 at 00:35
@tyjkenn that's an interesting strategy that I might fall back to. Unfortunately some of the data I need to deal with will only have 256 colours and that could easily be used up by a clip art gradient. — Luke Quinane, Feb 20 '12 at 01:01
Might belong to Theoretical CS StackExchange: http://cstheory.stackexchange.com/ — Adam Matan, Feb 20 '12 at 07:59

Simon Steinberger · Answer 1 · 2017-05-03T11:08:58.623

Rather simple, but effective approaches to differentiate between drawings and photos. Use them in combination to achieve a the best accuracy:

1) Mime type or file extension

PNGs are typically clip arts or drawings, while JPEGs are mostly photos.

2) Transparency

If the image has an alpha channel, it's most likely a drawing. In case an alpha channel exists, you can additionally iterate over all pixels to check if transparency is indeed used. Here a Python example code:

from PIL import Image
img = Image.open('test.png')
transparency = False
if img.mode in ('RGBA', 'RGBa', 'LA') or (img.mode == 'P' and 'transparency' in img.info):
    if img.mode != 'RGBA': img = img.convert('RGBA')
    transparency = any(px for px in img.getdata() if px[3] < 220)

print 'Transparency:', transparency

3) Color distribution

Clip arts often have regions with identical colors. If a few color make up a significant part of the image, it's rather a drawing than a photo. This code outputs the percentage of the image area that is made from the ten most used colors (Python example):

from PIL import Image
img = Image.open('test.jpg')
img.thumbnail((200, 200), Image.ANTIALIAS)
w, h = img.size
print sum(x[0] for x in sorted(img.convert('RGB').getcolors(w*h), key=lambda x: x[0], reverse=True)[:10])/float((w*h))

You need to adapt and optimize those values. Is ten colors enough for your data? What percentage is working best for you. Find it out by testing a larger number of sample images. 30% or more is typically a clip art. Not for sky photos or the likes, though. Therefore, we need another method - the next one.

4) Sharp edge detection via FFT

Sharp edges result in high frequencies in a Fourier spectrum. And typically such features are more often found in drawings (another Python snippet):

from PIL import Image
import numpy as np
img = Image.open('test.jpg').convert('L')
values = abs(numpy.fft.fft2(numpy.asarray(img.convert('L')))).flatten().tolist()
high_values = [x for x in values if x > 10000]
high_values_ratio = 100*(float(len(high_values))/len(values))
print high_values_ratio

This code gives you the number of frequencies that are above one million per area. Again: optimize such numbers according to your sample images.

Combine and optimize these methods for your image set. Let me know if you can improve this - or just edit this answer, please. I'd like to improve it myself :-)

score 4 · Answer 2 · answered Feb 20 '12 at 07:12

Histograms would be a first way to do this.
Convert the color image to grayscale and calculate the histogram. A very bi-modal histogram with 2 sharp peaks in black (or dark) and white (or right), probably with much more white, are a good indication for line-drawing.
If you have just a few more peaks then it is likely a clip-art type image.
Otherwise it's a photo.

score 4 · Accepted Answer · answered Feb 21 '12 at 17:28

This problem can be solved by image classification and that's probably Google's solution to the problem. Basically, what you have to do is (i) get a set of images labeled into 3 categories: photo, clip-art and line drawing; (ii) extract features from these images; (iii) use the image's features and label to train a classifier.

Feature Extraction:

In this step you have to extract visual information that may be useful for the classifier to discriminate between the 3 categories of images:

A very basic yet useful visual feature is the image histogram and its variants. For example, the gray level histogram of a photo is probably smoother than a histogram of a clipart, where you have regions that may be all of the same color value.
Another feature that one can use is to convert the image to the frequency domain (e.g. using FFT or DCT) and measure the energy of high frequency components. Because line drawings will probably have sharp transitions of colors, its high frequency components will tend to accumulate more energy.

There's also a number of other feature extraction algorithms that may be used.

Training a Classifier:

After the feature extraction phase, we will have for each image a vector of numeric values (let's call it the image feature vector) and its tuple. That's a suitable input for a training a classifier. As for the classifier, one may consider Neural Networks, SVM and others.

Classification:

Now that we have a trained classifier, to classify an image (i.e. detect a image category) we simply have to extract its features and input it to the classifier and it will return its predicted category

score 1 · Answer 4 · answered Feb 23 '12 at 05:40

In addition to color histograms, also consider edge information and the consistency of line widths throughout the image.

Photo - natural edges will have a variety of edge strengths, and it's less likely that there will be many parallel edges.

Clip art - A watershed algorithm could help identify large, connected regions of consistent brightness. In clip art and synthetic images designed for high visibility there are more likely to be perfectly straight lines and parallel lines. A histogram of edge strengths is likely to have a few very strong peaks.

Line drawing - synthetic lines are likely to have very consistent width. The Stroke Width Transform could help you identify strokes. (One of the basic principles is to find edge gradients that "point at" each other.) A histogram of edge strengths may have only one strong peak.

How to detect if an image is a photo, clip art or a line drawing?

Update:

4 Answers4

Feature Extraction:

Training a Classifier:

Classification:

Linked