9

I'm thinking of something like:

foreach (var word in paragraph.split(' ')) {
  if (badWordArray.Contains(word) {
    // do something about it
  }
}

but I'm sure there's a better way.

Thanks in advance!

UPDATE I'm not looking to remove obscenities automatically... for my web app, I want to be notified if a word I deem "bad" is used. Then I'll review it myself to make sure it's legit. An auto flagging system of sorts.

Keng
  • 52,011
  • 32
  • 81
  • 111
Chaddeus
  • 13,134
  • 29
  • 104
  • 162
  • I went ahead and edited my solution in response to your update. Let me know if that answers your question. – rakuo15 Jul 09 '10 at 10:29
  • possible duplicate of [How do you implement a good profanity filter?](http://stackoverflow.com/questions/273516/how-do-you-implement-a-good-profanity-filter) – George Stocker Oct 22 '10 at 16:21

3 Answers3

16

While your way works, it may be a bit time consuming. There is a wonderful response here for a previous SO question. Though the question talks about PHP instead of C#, I think it can be easily ported.

Edit to add sample code:

public string FilterWords(string inputWords) {
    Regex wordFilter = new Regex("(puppies|kittens|dolphins|crabs)");
    return wordFilter.Replace(inputWords, "<3");
}

That should work for you, more or less.

Edit to answer OP clarification:

I'm not looking to remove obscenities automatically... for my web app, I want to be notified if a word I deem "bad" is used.

Much as the replacement portion above, you can see if something matches like so:

public bool HasBadWords(string inputWords) {
    Regex wordFilter = new Regex("(puppies|kittens|dolphins|crabs)");
    return wordFilter.IsMatch(inputWords);
}

It will return true if the string you passed to it contains any words in the list.

Community
  • 1
  • 1
rakuo15
  • 889
  • 6
  • 11
  • 11
    If you're going to do this, **don't forget the `\b`**. It's a clbuttic mistake. – JSBձոգչ Jul 09 '10 at 03:34
  • Haha well done. The word boundary is important for sure, but if you want to filter for things like `redkittens` or `crabsapples`, this would do it. – rakuo15 Jul 09 '10 at 03:55
  • Thank you, I think a combination of your answer and Detmar's is what I'll end up doing. Much appreciated. – Chaddeus Jul 09 '10 at 11:26
  • I take it the regex way is more efficient than the looping way and only needs 1 pass? – pete Feb 04 '23 at 00:49
4

At my job we put some automatic bad word filtering into our software (it's kind of shocking to be browsing the source and suddenly run across the array containing several pages of obscenity).

One tip is to pre-process the user input before testing against your list, in that case that someone is trying to sneak something by you. So by way of preprocessing, we

  • uppercase everything in the input
  • remove most non-alphanumerics (that is, just splice out any spaces, or punctuation, etc.)
  • and then assuming someone is trying to pass off digits for letters, do the something like this: replace zero with O, 9 with G, 5 with S, etc. (get creative)

And then get some friends to try to break it. It's fun.

Detmar
  • 713
  • 4
  • 6
2

You could consider using the HashKey objects or Dictionary<T1, T2> instead of the array as using a Dictionary for example can make code more efficient, because the .Contains() method becomes .Keys.Contains() which is way more efficient. This is especially true if you have a large list of profanities (not sure how many there are! :)

Alex
  • 4,844
  • 7
  • 44
  • 58