120

Akismet does an amazing job at detecting spam comments. But comments are not the only form of spam these days. What if I wanted something like akismet to automatically detect porn images on a social networking site which allows users to upload their pics, avatars, etc?

There are already a few image based search engines as well as face recognition stuff available so I am assuming it wouldn't be rocket science and it could be done. However, I have no clue regarding how that stuff works and how I should go about it if I want to develop it from scratch.

How should I get started?

Is there any open source project for this going on?

Andy
  • 49,085
  • 60
  • 166
  • 233
Raj
  • 6,810
  • 6
  • 48
  • 56
  • 82
    Actually I'd say that sounds a lot harder than rocket science! We've already got lots of rockets, but AFAIK no such "porn detector" :) – GaZ Apr 03 '09 at 10:17
  • 4
    LOL. There's face recognition, but there's no technology as yet for genital and breast recognition. Tough luck. – Jon Limjap Apr 03 '09 at 10:23
  • 11
    Pornography is a matter of geography - besides, I am sure there are plenty of pictures that does not show genitals or nudety at all, which would be considered quite hardcore (again - in some places). Sounds like a job for an advanced AI, not a simple algorithm. – Noam Gal Jun 17 '09 at 09:10
  • 2
    I just stumbled across this utility that made me remember this thread. Wonder if it works? http://proofpronto.com/porn-detection-stick-by-paraben.html – Martin Smith Apr 17 '10 at 12:27
  • Interesting question. Some googling reveals this slashdot article, which might be a good starting point: http://tech.slashdot.org/story/00/11/15/1354239/Even-More-Porn-Image-Recognition-Software – Denis de Bernardy May 28 '11 at 16:36
  • @Denis: That post in 2 years old, so perhaps there may be some new software or articles available. Still worth to look through, though. –  May 28 '11 at 16:39
  • 1
    @jm666, if the problem is that serious to you, and you think the state of the art in this area has improved over the last two years, how about placing a bounty on the referenced dupe? You should gather some attention, and maybe some answers, that way. – Michael Petrotta May 30 '11 at 18:49
  • Developing AI for this calls for a similar approach to CAPTCHA. Instead of helping OCR, people visiting porn sites should be presented with a few pictures first (some of which are clearly not porn, some of which clearly are and some of which might be) and asked to decide whether each is porn or not, before they're allowed to watch the actual content. ;) Their geographical location might be noted to take cultural differences into account. – Thijs van Dien Jan 19 '13 at 12:23
  • `bool is_porn(image im) { (void) im; return false; }`, guarranteed to correctly determine whether any image should be blocked with 0% false positive and 0% false negative rates. – David X Oct 16 '13 at 17:51

25 Answers25

89

This is actually reasonably easy. You can programatically detect skin tones - and porn images tend to have a lot of skin. This will create false positives but if this is a problem you can pass images so detected through actual moderation. This not only greatly reduces the the work for moderators but also gives you lots of free porn. It's win-win.

#!python    
import os, glob
from PIL import Image

def get_skin_ratio(im):
    im = im.crop((int(im.size[0]*0.2), int(im.size[1]*0.2), im.size[0]-int(im.size[0]*0.2), im.size[1]-int(im.size[1]*0.2)))
    skin = sum([count for count, rgb in im.getcolors(im.size[0]*im.size[1]) if rgb[0]>60 and rgb[1]<(rgb[0]*0.85) and rgb[2]<(rgb[0]*0.7) and rgb[1]>(rgb[0]*0.4) and rgb[2]>(rgb[0]*0.2)])
    return float(skin)/float(im.size[0]*im.size[1])

for image_dir in ('porn','clean'):
    for image_file in glob.glob(os.path.join(image_dir,"*.jpg")):
        skin_percent = get_skin_ratio(Image.open(image_file)) * 100
        if skin_percent>30:
            print "PORN {0} has {1:.0f}% skin".format(image_file, skin_percent)
        else:
            print "CLEAN {0} has {1:.0f}% skin".format(image_file, skin_percent)

This code measures skin tones in the center of the image. I've tested on 20 relatively tame "porn" images and 20 completely innocent images. It flags 100% of the "porn" and 4 out of the 20 of the clean images. That's a pretty high false positive rate but the script aims to be fairly cautious and could be further tuned. It works on light, dark and Asian skin tones.

It's main weaknesses with false positives are brown objects like sand and wood and of course it doesn't know the difference between "naughty" and "nice" flesh (like face shots).

Weakness with false negatives would be images without much exposed flesh (like leather bondage), painted or tattooed skin, B&W images, etc.

source code and sample images

SpliFF
  • 38,186
  • 16
  • 91
  • 120
69

This was written in 2000, not sure if the state of the art in porn detection has advanced at all, but I doubt it.

http://www.dansdata.com/pornsweeper.htm

PORNsweeper seems to have some ability to distinguish pictures of people from pictures of things that aren't people, as long as the pictures are in colour. It is less successful at distinguishing dirty pictures of people from clean ones.

With the default, medium sensitivity, if Human Resources sends around a picture of the new chap in Accounts, you've got about a 50% chance of getting it. If your sister sends you a picture of her six-month-old, it's similarly likely to be detained.

It's only fair to point out amusing errors, like calling the Mona Lisa porn, if they're representative of the behaviour of the software. If the makers admit that their algorithmic image recogniser will drop the ball 15% of the time, then making fun of it when it does exactly that is silly.

But PORNsweeper only seems to live up to its stated specifications in one department - detection of actual porn. It's half-way decent at detecting porn, but it's bad at detecting clean pictures. And I wouldn't be surprised if no major leaps were made in this area in the near future.

Community
  • 1
  • 1
Jeff Atwood
  • 63,320
  • 48
  • 150
  • 153
  • Of cause porn detection has advanced since then. There have been a lot of break-thoughts in object recognition/image classication/computer vision. 2000 feels like the stone age to me. – Maarten Oct 17 '13 at 12:28
45

I would rather allow users report on bad images. Image recognition development can take too much efforts and time and won't be as much as accurate as human eyes. It's much cheaper to outsource that moderation job.

Take a look at: Amazon Mechanical Turk

"The Amazon Mechanical Turk (MTurk) is one of the suite of Amazon Web Services, a crowdsourcing marketplace that enables computer programs to co-ordinate the use of human intelligence to perform tasks which computers are unable to do."

Konstantin Tarkus
  • 37,618
  • 14
  • 135
  • 121
15

BOOM! Here is the whitepaper containing the algorithm.

Does anyone know where to get the source code for a java (or any language) implementation?

That would rock.

One algorithm called WISE has a 98% accuracy rate but a 14% false positive rate. So what you do is you let the users flag the 2% false negatives, ideally with automatic removal if a certain number of users flag it, and have moderators view the 14% false positives.

Ura
  • 2,173
  • 3
  • 24
  • 41
davidjnelson
  • 1,111
  • 12
  • 22
  • You found the algorithm. That's pretty darn good. The source code is often left as an exercise. After all, we aren't specifying any particular programming language, are we? – Ian Sep 19 '10 at 03:03
9

Nude.js based on the whitepaper by Rigan Ap-apid from De La Salle University.

Ura
  • 2,173
  • 3
  • 24
  • 41
Abhinav Kaushik
  • 163
  • 1
  • 5
8

There is software that detects the probability for porn, but this is not an exact science, as computers can't recognize what is actually on pictures (pictures are only a big set of values on a grid with no meaning). You can just teach the computer what is porn and what not by giving examples. This has the disadvantage that it will only recognize these or similar images.

Given the repetitive nature of porn you have a good chance if you train the system with few false positives. For example if you train the system with nude people it may flag pictures of a beach with "almost" naked people as porn too.

A similar software is the facebook software that recently came out. It's just specialized on faces. The main principle is the same.

Technically you would implement some kind of feature detector that utilizes a bayes filtering. The feature detector may look for features like percentage of flesh colored pixels if it's a simple detector or just computes the similarity of the current image with a set of saved porn images.

This is of course not limited to porn, it's actually more a corner case. I think more common are systems that try to find other things in images ;-)

Peter G.
  • 14,786
  • 7
  • 57
  • 75
Patrick Cornelissen
  • 7,968
  • 6
  • 48
  • 70
  • 1
    Why do people down-vote this answer? – Patrick Cornelissen Apr 21 '09 at 07:37
  • because it doesn't contain anything like an algorithm, recipe, or reference. – Ian Sep 19 '10 at 02:49
  • 7
    So it's not a valid answer to explain the user asking the question that it's not really possible what he tries to achieve? Dude, you might be a little bit more releaxed... – Patrick Cornelissen Sep 20 '10 at 06:46
  • It's also making a false statement "as computers can't recognize what is actually on pictures" – Daveth3Cat Oct 16 '13 at 15:20
  • Because they can't. You can only learn to detect certain images and the larger your db of positive and negative cases is, the better, but in general you will never get a solution that is as accurate as a human, so you will end up with a huge number of false positives and negatives. – Patrick Cornelissen Oct 17 '13 at 16:40
5

A graduate student from National Cheng Kung University in Taiwan did a research on this subject in 2004. He was able to achieve success rate of 89.79% in detecting nude pictures downloaded from the Internet. Here is the link to his thesis: The Study on Naked People Image Detection Based on Skin Color
It's in Chinese therefore you may need a translator in case you can't read it.

myang
  • 193
  • 1
  • 9
5

The answer is really easy: It's pretty safe to say that it won't be possible in the next two decades. Before that we will probably get good translation tools. The last time I checked, the AI guys were struggling to identify the same car on two photographs shot from a slightly altered angle. Take a look on how long it took them to get good enough OCR or speech recognition together. Those are recognition problems which can benefit greatly from dictionaries and are still far from having completely reliable solutions despite of the multi-million man months thrown at them.

That being said you could simply add an "offensive?" link next to user generated contend and have a mod cross check the incoming complaints.

edit:

I forgot something: IF you are going to implement some kind of filter, you will need a reliable one. If your solution would be 50% right, 2000 out of 4000 users with decent images will get blocked. Expect an outrage.

Thomasz
  • 233
  • 2
  • 8
4

short answer: use a moderator ;)

Long answer: I dont think there's a project for this cause what is porn? Only legs, full nudity, midgets etc. Its subjective.

RvdK
  • 19,580
  • 4
  • 64
  • 107
  • 3
    the question is "What is the best way to programatically detect porn images?", programatically... – Agusti-N Apr 03 '09 at 13:30
  • 5
    I know the question, but as I said there is no 100% accurate porn blocker because porn is subjective. Subjective can't be related to code. 1 thinks is just nudity, other thinks its porn. Better solution is to have a 'report image' button. Same idea as Koistya Navin .NET – RvdK Apr 03 '09 at 14:00
  • 1
    "Midgets etc."? Holy non-sequitur, Batman. – Doug McClean Oct 17 '09 at 04:16
  • There is such a thing as midget porn. – Chris Sherlock Oct 16 '13 at 16:08
4

Add an offensive link and store the md5 (or other hash) of the offending image so that it can automatically tagged in the future.

How cool would it be if somebody had a large public database of image md5 along with descriptive tags running as a webservice? Alot of porn isn't original work (in that the person who has it now, didn't probably make it) and the popular images tend to float around different places, so this could really make a difference.

rfusca
  • 7,435
  • 2
  • 30
  • 34
  • 8
    I doubt it. There is SO much porn out there (and tons more generated by the day) that your odds of seeing the same picture twice are (IMHO) rather close to zero. – Vilx- Apr 03 '09 at 12:59
  • Think about how often tub girl showed up all over for awhile. It would have gotten flagged once and then everybody else could have avoided it. – rfusca Apr 03 '09 at 13:19
  • 3
    unless it were cropped, resized, or just opened and saved again before being uploaded.. – Blorgbeard Apr 06 '09 at 20:40
  • Ya, I thought about that :( eh, it was a thought. – rfusca Apr 07 '09 at 14:39
  • 1
    Better than md5, licence idée's TinEye. – Tobu Jan 11 '10 at 12:30
  • @rfusca: Damn you for mentioning tub girl, that's just sick and now I won't be able to sleep. – Alix Axel Mar 23 '10 at 05:11
2

If you're really have time and money:

One way of doing it is by 1) Writing an image detection algorithm to find whether an object is human or not. This can be done by bitmasking an image to retrieve it's "contours" and see if the contours fits a human contour.

2) Data mine a lot of porn images and use data mining techniques such as the C4 algorithms or Particle Swarm Optimization to learn to detect pattern that matches porn images.

This will require that you identify how a naked man/woman contours of a human body must look like in digitized format (this can be achieved in the same way OCR image recognition algorithms works).

Hope you have fun! :-)

Buhake Sindi
  • 87,898
  • 29
  • 167
  • 228
2

Seems to me like the main obstacle is defining a "porn image". If you can define it easily, you could probably write something that would work. But even humans can't agree on what is porn. How will the application know? User moderation is probably your best bet.

Rimian
  • 36,864
  • 16
  • 117
  • 117
1

The BrightCloud web service API is perfect for this. It's a REST API for doing website lookups just like this. It contains a very large and very accurate web filtering DB and one of the categories, Adult, has over 10M porn sites identified!

Chris Harris
  • 4,705
  • 3
  • 24
  • 22
1

I've heard about tools which were using very simple, but quite effective algorithm. The algorithm calculated relative amount of pixels with color value near to some predefined "skin" colours. If that amount is higher than some predefined value then image is considered to be of erotic/pornographic content. Of course that algorithm will give false positive results for close-up face photos and many other things.
Since you are writing about social networking there will be lots of "normal" photos with high amount of skin colour on it, so you shouldn't use this algorithm to deny all pictures with positive result. But you can use it provide some help for moderators, for example flag these pictures with higher priority, so if moderator want to check some new pictures for pornographic content he can start from these pictures.

okutane
  • 13,754
  • 10
  • 59
  • 67
  • I've actually seen a system similar to that in use. Its not reliable enough to be left on its own, but it does a very good job of alerting a moderator when appropriate. Its not full proof, especially if the person is covered with only one small exposed area. The ratio doesn't quite work as reliably in reverse. – Tim Post Sep 29 '09 at 07:15
1

This one looks promising. Basically they detect skin (with calibration by recognizing faces) and determine "skin paths" (i.e. measuring the proportion of skin pixels vs. face skin pixels / skin pixels). This has decent performance. http://www.prip.tuwien.ac.at/people/julian/skin-detection

alexsee75
  • 11
  • 1
1

I've seen a web filtering application which does porn image filtering, sorry I can't remember the name. It was pretty prone to false positives however most of the time it was working.

I think main trick is detecting "too much skin on the picture :)

dr. evil
  • 26,944
  • 33
  • 131
  • 201
  • 1
    I can't remember the study either - but it did an edge detection and matched what appeared to be patterns of vulvas rotated or obscured. Quite interesting from an image processing aspect. – jim Apr 03 '09 at 10:37
  • -1, This provides commentary but doesn't give a substantial solution. – Brad Koch Oct 16 '13 at 16:06
1

Detecting porn images is still a definite AI task which is very much theoretical yet.

Harvest collective power and human intelligence by adding a button/link "Report spam/abuse". Or employ several moderators to do this job.

P.S. Really surprised how many people ask questions assuming software and algorithms are all-mighty without even thinking whether what they want could be done. Are they representatives of that new breed of programmers who have no understanding of hardware, low-level programming and all that "magic behind"?

P.S. #2. I also remember that periodically it happens that some situation when people themselves cannot decide whether a picture is porn or art is taken to the court. Even after the court rules, chances are half of the people will consider the decision wrong. The last stupid situation of the kind was quite recently when a Wikipedia page got banned in UK because of a CD cover image that features some nakedness.

User
  • 30,403
  • 22
  • 79
  • 107
1

Two options I can think of (though neither of them is programatically detecting porn):

  1. Block all uploaded images until one of your administrators has looked at them. There's no reason why this should take a long time: you could write some software that shows 10 images a second, almost as a movie - even at this speed, it's easy for a human being to spot a potentially pornographic image. Then you rewind in this software and have a closer look.
  2. Add the usual "flag this image as inappropriate" option.
Rich
  • 15,602
  • 15
  • 79
  • 126
0

It is not rocket science. Not anymore. It is very similar to face recognition. I think that the easiest way to deal with it is to use machine learning. And since we are dealing with images, I can point towards neuronal networks, because these seem to be preferred for images. You will need training data. And you can find tons of training data on the internet but you have to crop the images to the specific part that you want the algorithm to detect. Of course you will have to break the problem into different body parts that you want to detect and create training data for each, and this is where things become amusing.

Like someone above said, it cannot be done 100% percent. There will be cases where such algorithms fail. The actual precision will be determined by your training data, the structure of your neuronal networks and how you will choose to cluster the training data (penises, vaginas, breasts, etc, and combinations of such). In any case I am very confident that this can be achieved with high accuracy for explicit porn imagery.

Radu Simionescu
  • 4,518
  • 1
  • 35
  • 34
0

This is a nudity detector. I haven't tried it. It's the only OSS one I could find.

https://code.google.com/p/nudetech

mikeslattery
  • 4,039
  • 1
  • 19
  • 14
0

I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description ["hard-core pornography"]; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that.

United States Supreme Court Justice Potter Stewart, 1964

Jason S
  • 184,598
  • 164
  • 608
  • 970
0

Look at file name and any attributes. There's not nearly enough information to detect even 20% of naughty images, but a simple keyword blacklist would at least detect images with descriptive labels or metadata. 20 minutes of coding for a 20% success rate isn't a bad deal, especially as a prescreen that can at least catch some simple ones before you pass the rest to a moderator for judging.

The other useful trick is the opposite of course, maintain a whitelist of image sources to allow without moderation or checking. If most of your images come from known safe uploaders or sources, you can just accept them bindly.

SPWorley
  • 11,550
  • 9
  • 43
  • 63
0

You can find many whitepapers on the net dealing with this subject.

Ura
  • 2,173
  • 3
  • 24
  • 41
-1

There is no way you could do this 100% (i would say maybe 1-5% would be plausible) with nowdays knowledge. You would get much better result (than those 1-5%) just checking the image-names for sex-related-words :).

@SO Troll: So true.

sabiland
  • 2,526
  • 1
  • 25
  • 24