0

I have this code below which works ok ish.

$swearWords = file("blacklist.txt");
foreach ($swearWords as $naughty)
{
    $post = str_ireplace(rtrim($naughty), "<b><i>(oops)</i></b>", $post); 
}

The problem is with words that contain thee swear words..

for instant "Scunthorpe" has a bad word within it. this code changes it to S(oops)horpe.

Any ideas how i can fix this ? do I need to

Lee
  • 47
  • 5

3 Answers3

2

You can replace your str_replace() with a preg_replace that ignores words that have leading and/or trailing letters, so a swear word is only replaced if its standing alone:

$post = "some Scunthorpe text";
$newpost = $post;
$swearWords = file("blacklist.txt");
foreach ($swearWords as $naughty)
{
    $naughty = preg_quote($naughty, '/');
    $newpost = preg_replace("/([^a-z]+{$naughty}[^a-z]*|[^a-z]+{$naughty}[^a-z]+)/i", "<b><i>(oops)</i></b>", $newpost); 
}
if ($newpost) $post = $newpost;
else echo "an error occured during regex replacement";

Note that it still allows swear words like "aCUNT", "soFUCKINGstupid", ... i don't know how you could even handle that.

Kaii
  • 20,122
  • 3
  • 38
  • 60
1

Swear and profanity filters are notoriously bad at catching "false positives".

The easiest way of dealing with these, in dictionary terms is to use a whitelist (in a similar way to your blacklist). A list of words that contain matches, but that are essentially allowed.

It's worth you reading this: How do you implement a good profanity filter which details the pro's and cons.

Community
  • 1
  • 1
nickhar
  • 19,981
  • 12
  • 60
  • 73
0

This oughta do it:

$swearWords = file("blacklist.txt");
$post_words = preg_split("/\s+/", $post);

foreach ($swearWords as $naughty)
{
    foreach($post_words as &$word)
    {
        if(stripos($word, $naughty) !== false)
        {
            $word = "<b><i>(oops)</i></b>";
        }
    }
}
$post = implode(' ', $post_words);

So what's happening? It loads in your swear words, then loops through these. It then loops through all the words in the post, and checks if the current swearword exists in the currently looked at word. If it does, it removes it replaces it with your 'oops'.

Note that this will remove any whitespace formatting, so check this suits your situation first (do you care about tab characters or multiple sequential spaces?)

deed02392
  • 4,799
  • 2
  • 31
  • 48
  • This will leave an array of all elements / words of the post, removing all whitespace, line breaks, indentation. Even if you assemble the result back to a text (eg. `implode(" ", $post_words)`), you would break all whitespace formatting.. – Kaii Nov 08 '12 at 11:24