Showing too much 'skin' detection in software

Question

I am building an ASP.NET web site where the users may upload photos of themselves. There could be thousands of photos uploaded every day. One thing my boss has asked a few time is if there is any way we could detect if any of the photos are showing too much 'skin' and automatically move flag these as 'Adults Only' before the editors make the final decision.

I can't really answer, but I just want to point out that those automatic detectors are really good at identifying bald people as nude. — Michael Stum, Nov 04 '08 at 20:54

Andrew Bullock · Answer 1 · 2008-11-07T09:52:23.363

Your best bet is to deal with the image in the HSV colour space (see here for rgb - hsv conversion). The colour of skin is pretty much the same between all races, its just the saturation that changes. By dealing with the image in HSV you can simply search for the colour of skin.

You might do this by simply counting the number of pixel within a colour range, or you could perform region growing around pixel to calculate the size of the areas the colour.

Edit: for dealing with grainy images, you might want to perform a median filter on the image first, and then reduce the number of colours to segment the image first, you will have to play around with the settings on a large set of pre-classifed (adult or not) images and see how the values behave to get a satisfactory level of detection.

EDIT: Heres some code that should do a simple count (not tested it, its a quick mashup of some code from here and rgb to hsl here)

Bitmap b = new Bitmap(_image);
BitmapData bData = b.LockBits(new Rectangle(0, 0, _image.Width, _image.Height), ImageLockMode.ReadWrite, b.PixelFormat);
byte bitsPerPixel = GetBitsPerPixel(bData.PixelFormat);
byte* scan0 = (byte*)bData.Scan0.ToPointer();

int count;

for (int i = 0; i < bData.Height; ++i)
{
    for (int j = 0; j < bData.Width; ++j)
    {
        byte* data = scan0 + i * bData.Stride + j * bitsPerPixel / 8;

        byte r = data[2];
        byte g = data[1];
        byte b = data[0];

        byte max = (byte)Math.Max(r, Math.Max(g, b));
        byte min = (byte)Math.Min(r, Math.Min(g, b));

        int h;

        if(max == min)
            h = 0;
        else if(r > g && r > b)
            h = (60 * ((g - b) / (max - min))) % 360;
        else if (g > r && g > b)
            h = 60 * ((b - r)/max - min) + 120;
        else if (b > r && b > g)
            h = 60 * ((r - g) / max - min) + 240;


        if(h > _lowerThresh && h < _upperThresh)
            count++;
    }
}
b.UnlockBits(bData);

Very important, of course, would be to make sure the editors are quick to review suspect images, because you're probably going to get lots of false positives. — Kip, Nov 06 '08 at 14:18

score 34 · Answer 2 · answered Nov 04 '08 at 21:46

34

Of course, this will fail for the first user who posts a close-up of someone's face (or hand, or foot, or whatnot). Ultimately, all these forms of automated censorship will fail until there's a real paradigm-shift in the way computers do object recognition.

I'm not saying that you shouldn't attempt it nontheless; but I want to point to these problems. Do not expect a perfect (or even good) solution. It doesn't exist.

answered Nov 04 '08 at 21:46

Konrad Rudolph

530,221
131
937
1,214

Most probably a good compromise is to implement a high-sensitivity people detector, so that only absolutely-not-porn is accepted by the computer, and everything else (hopefully a much smaller fraction of the total) should be reviewed by a human classifier. – heltonbiker Dec 13 '12 at 16:08
@heltonbiker And then you get PR disasters like Facebook who recently blocked images of people that *looked* naked even though they weren’t (mind you, Facebook uses *human* moderators rather than a software solution). And besides bad PR, this simply smacks of censorship. Each their own but if I were required to implement a similar solution that favours false positives rather than false negatives this might be grounds for resignation. – Konrad Rudolph Dec 13 '12 at 16:38

JSBձոգչ · Answer 3 · 2008-11-06T17:29:06.313

21

I doubt that there exists any off-the-shelf software that can determine if the user uploads a naughty picture. Your best bet is to let users flag images as 'Adults Only' with a button next to the picture. (Clarification: I mean users other than the one who uploaded the picture--similar to how posts can be marked offensive here on StackOverflow.)

Also, consider this review of an attempt to do the same thing in a dedicated product: http://www.dansdata.com/pornsweeper.htm.

Link stolen from today's StackOverflow podcast, of course :).

edited Nov 06 '08 at 17:29

answered Nov 04 '08 at 20:53

JSBձոգչ

40,684
18
101
169

1

Do you really trust users of a site to check the 'evil bit' when they up load an image that is questionable? – Peter M Nov 04 '08 at 20:58
I think he means that other users will flag it as offensive / adult only. (And a copy will be sent to me =D) – StingyJack Nov 04 '08 at 20:59
@StingyJack or implement a list of users to send it all =) – Seiti Nov 04 '08 at 22:03
1

There is an off the shelf free software for that, here https://github.com/EugenCepoi/nsfw_api :) – eugen Oct 14 '18 at 06:34
@eugen This answer is almost 10 years old, and i suspect that the explosion of ML in the meantime means that this is now a much more feasible proposition! – JSBձոգչ Oct 16 '18 at 07:58

score 15 · Answer 4 · answered Nov 04 '08 at 20:54

15

We can't even write filters that detect dirty words accurately in blog posts, and your boss is asking for a porno detector? CLBUTTIC!

answered Nov 04 '08 at 20:54

Tim Howland

7,919
4
28
46

I know it's not easy but I am sure large dating sites such as match.com use some kind of detection. And there will be a second level human editors to check for false positives. – Craig Nov 04 '08 at 21:01
It's all good until they try automatically drawing clothes on the pics; which is what screws most people up. – NotMe Nov 04 '08 at 21:33
I think you are buttuming that the same algorithm is used for pictures and words. People like you should be buttbuttinated (which strangely sounds worse than the the original word, reminds me of the death by bongo-bongo joke:-)). – Tim Ring Nov 10 '08 at 09:32

conny · Answer 5 · 2008-11-08T21:08:23.690

11

I would say your answer lies in crowdsourcing the task. This almost always works and tends to scale very well.

It doesn't have to involve making some users into "admins" and coming up with different permissions - it can be as simple as to enable an "inappropriate" link near each image and keeping a count.

edited Nov 08 '08 at 21:08

answered Nov 04 '08 at 21:32

conny

9,973
6
38
47

We will go down that route as well I think. – Craig Nov 04 '08 at 21:40
Or outsource it to Mechanical Turk – John Sheehan Nov 06 '08 at 05:36
There's a userfriendly cartoon on this: http://ars.userfriendly.org/cartoons/?id=20081210 – ConcernedOfTunbridgeWells Dec 24 '08 at 17:19

score 6 · Answer 6 · answered Feb 17 '09 at 18:34

6

See the seminal paper "Finding Naked People" by Fleck/Forsyth published in ECCV. (Advanced).

http://www.cs.hmc.edu/~fleck/naked.html

answered Feb 17 '09 at 18:34

graveca

694
7
6

score 5 · Answer 7 · answered Nov 04 '08 at 21:36

Interesting question from a theoretical / algorithmic standppoint. One approach to the problem would be to flag images that contain large skin-colored regions (as explained by Trull).

However, the amount of skin shown is not a determinant of an offesive image, it's rather the location of the skin shown. Perhaps you can use face detection (search for algorithms) to refine the results -- determine how large the skin regions are relative to the face, and if they belong to the face (perhaps how far below it they are).

Very good suggestion. It's easy enough to actually implement and would probably work pretty good. — Hannes Ovrén, Nov 15 '08 at 13:42

score 3 · Answer 8 · answered Nov 04 '08 at 21:42

I know either Flickr or Picasa has implemented this. I believe the routine was called FleshFinder.

A tip on the architecture of doing this:

Run this as a windows service separate from the ASP.NET Pipeline, instead of analyzing images in real time, create a queue of new images that are uploaded for the service to work through.

You can use the normal System.Drawing stuff if you want, but if you really need to process a lot of images, it would be better to use native code and a high performance graphics library and P/invoke the routine from your service.

As resources are available, process images in the background and flag ones that are suspicious for editors review, this should prune down the number of images to review significantly, while not annoying people who upload pictures of skin colored houses.

score 3 · Answer 9 · answered Nov 04 '08 at 21:46

3

I would approach the problem from a statistical standpoint. Get a bunch of pictures that you consider safe, and a bunch that you don't (that will make for a fun day of research), and see what they have in common. Analyze them all for color range and saturation to see if you can pick out characteristics that all of the naughty photos, and few of the safe ones have.

answered Nov 04 '08 at 21:46

Bill the Lizard

398,270
210
566
880

This is an interesting point. I have heard people from Google say before that given enough data anything can be solved using statistics. Algorithms are not alway required. For example the spell check on Google.com is statistics driven not a spell check algorithm. – Craig Nov 04 '08 at 21:54
This is kind of what i was getting at, just from the other approach. This is probably the starting point for what i suggested. Do a load of analysis first to give you some starting off points for the suggested thresholds in your detector. – Andrew Bullock Nov 04 '08 at 22:27
Im actually quite interested in this, if you can send me a farly decent sized set of test images, I'd have a play for you - you can happily have the code, i might sourceforge it as a library if its any good – Andrew Bullock Nov 04 '08 at 22:28
1

@Trull: You could probably sift through SO gravatars for images that are in the safe category. The internet is full of test image in the "naughty" category. :) – Bill the Lizard Nov 05 '08 at 14:59

score 1 · Answer 10 · answered Nov 04 '08 at 21:28

1

Perhaps the Porn Breath Test would be helpful - as reported on Slashdot.

answered Nov 04 '08 at 21:28

BIBD

15,107
25
85
137

score 1 · Answer 11 · answered Nov 05 '08 at 13:56

1

Rigan Ap-apid presented a paper at WorldComp '08 on just this problem space. The paper is allegedly here, but the server was timing out for me. I attended the presentation of the paper and he covered comparable systems and their effectiveness as well as his own approach. You might contact him directly.

answered Nov 05 '08 at 13:56

plinth

48,267
11
78
120

Ah, that might actually be another paper by Rigan, but it might be helpful anyway. – Rasmus Faber Nov 05 '08 at 14:28

score 0 · Answer 12 · answered Jul 07 '12 at 16:40

As mentioned above by Bill (and Craig's google quote) statistical methods can be highly effective.

Two approaches you might want to look into are:

Neural Networks
Multi Variate Analysis (MVA)

The MVA approach would be to get a "representative sample" of acceptable pictures and of unacceptable pictures. The X data would be an array of bytes from each picture, the Y would be assigned by you as a 1 for unacceptable and a 0 for acceptable. Create a PLS model using this data. Run new data against the model and see how well it predicts the Y.

Rather than this binary approach you could have multiple Y's (e.g. 0=acceptable, 1=swimsuit/underwear, 2=pornographic)

To build the model you can look at open source software or there are a number of commercial packages available (although they are typically not cheap)

Because even the best statistical approaches are not perfect the idea of also including user feedback would probably be a good idea.

Good luck (and worst case you get to spend time collecting naughty pictures as an approved and paid activity!)

score 0 · Answer 13 · answered Nov 04 '08 at 21:15

0

I'm afraid I can't help point you in the right direction, but I do remember reading about this being done before. It was in the context of people complaining about baby pictures being caught and flagged mistakenly. If nothing else, I can give you the hope that you don't have to invent the wheel all by yourself... Someone else has been down this road!

answered Nov 04 '08 at 21:15

Brian Knoblauch

20,639
15
57
92

IIRC, the solution there was to disallow baby pictures completely. – Joel Coehoorn Nov 04 '08 at 21:20

score 0 · Answer 14 · answered Nov 06 '08 at 21:21

CrowdSifter by Dolores Labs might do the trick for you. I read their blog all the time as they seem to love statistics and crowdsourcing and like to talk about it. They use amazon's mechanical turk for a lot of their processing and know how to process the results to get the right answers out of things. Check out their blog at the very least to see some cool statistical experiments.

Showing too much 'skin' detection in software

14 Answers14

Linked