2

Am trying to implement a little bit of code found on stack overflow which covers a spam words filter. When i just type a spam word the function works however when I type in a bunch of text before the spam word it passes. I've checked the source and I must be missing something, can anyone help?

code is:

function strpos_arr($haystack, $needle) {
    if(!is_array($needle)) $needle = array($needle);
    foreach($needle as $what) {
    if(($pos = strpos($haystack, $what))!==false) return $pos;
}
return false;
}

function I'm calling it like is:

if(strpos_arr($text, $bad_words)) {
        return false;
    } else {
        return true;
    }

the array is just a simple array with a lot of bad words like so:

$bad_words = array(
        'bad word 1',
        'bad word 2');

link to original article: Using an array as needles in strpos

Thanks

Community
  • 1
  • 1
jamper
  • 277
  • 2
  • 5
  • 24

3 Answers3

2

Firstly, it looks like you have your logic the wrong way round. I think:

if(strpos_arr($text, $bad_words)) {
    return false;
} else {
    return true;
}

should be:

if (strpos_arr($text, $bad_words)) {
    return TRUE;
} else {
    return FALSE;
}

Then, you're returning $pos if a bad word is found. If $pos happens to be zero, it's going to fail the next check. Unless you need to know the position of the bad word in the text, I would change it to:

if (($pos = strpos($haystack, $what)) !== FALSE) return TRUE;
danmullen
  • 2,556
  • 3
  • 20
  • 28
  • It's more a case of if a bad word is found flag an error message. – jamper Oct 14 '14 at 13:01
  • Yes, I thought so. The changes in my answer should sort it out. Have you tried it? – danmullen Oct 14 '14 at 13:03
  • Just changing "return $pos" to "return TRUE" makes the function name misleading (it won't return a "string position"). "strpos_array" is case sensitive and may return true when any string from the needle is a substring in the haystack. – Pedro Amaral Couto Oct 14 '14 at 14:09
2

The function strpos_arr returns the position of the first "needle" found in the string:

if(($pos = strpos($haystack, $what))!==false) return $pos;

or false if there aren't any "needles" in the text.

This means that strpos_arr($text, $bad_words) returns false if there is any bad word in the text. Otherwise it returns an integer with the position of the first bad word in the string.

Notice that when the text starts with a bad word, it will return a 0, that is equivalent to false. That's why when you "just type a spam word the function works however when I type in a bunch of text before the spam word it passes".

You could implement a function to find bad words like this:

function has_bad_word($text, array $bad_words) {
    return strpos_arr($text, $bad_words) === false;
}

Notice though that strpos_arr is case sensitive and will return true when any string from the needle is a substring in the haystack, even when it's part of a larger word. This function solves both issues:

function has_bad_word($text, array $bad_words) {
    $pregQuotedBadWords = array_map('preg_quote', $bad_words, array('/'));
    $badWordsRegex = '/((\s+|^)'
                     . join('(\s+|$))|((\s+|^)', $pregQuotedBadWords)
                     . '(\s+|$))/is';
    return preg_match($badWordsRegex, $text) > 0;
}
Pedro Amaral Couto
  • 2,056
  • 1
  • 13
  • 15
0

I've implemented something similar using the an highlight library for jQuery. Basically, I provide a list of 700+ spam words and the library highlights each word that match the regex. Have a look at the source code (here) to see how it's implemented:

Here's a snippet:

$(function () {
   $("#spam-checker--textarea").highlightWithinTextarea({
      highlight: [
        { highlight: /\baccess\b/gi, keyword: "Access", category: "urgency" },
        { highlight: /\baccess now\b/gi, keyword: "Access now", category: "urgency" },
        { highlight: /\bact\b/gi, keyword: "Act", category: "urgency" },
        { highlight: /\bact immediately\b/gi, keyword: "Act immediately", category: "urgency" },
        { highlight: /\bact now\b/gi, keyword: "Act now", category: "urgency" },
        { highlight: /\bact now!\b/gi, keyword: "Act now!", category: "urgency" },
        { highlight: /\baction\b/gi, keyword: "Action", category: "urgency" },
        { highlight: /\baction required\b/gi, keyword: "Action required", category: "urgency" },
        { highlight: /\bapply here\b/gi, keyword: "Apply here", category: "urgency" },
        { highlight: /\bapply now\b/gi, keyword: "Apply now", category: "urgency" },
        { highlight: /\bapply now!\b/gi, keyword: "Apply now!", category: "urgency" },
        { highlight: /\bapply online\b/gi, keyword: "Apply online", category: "urgency" },
        // ...
      ]
   })
})
Frenchcooc
  • 910
  • 6
  • 20