1

I have a PHP function to check if a string contains specific (full) 'words' from an array (some of these 'words' may start with a special character followed by a space OR end with a space). The problem is with 'words' that start with special characters, for example: +, -, /, $, # etc. Why this 'contains' function doesn't catch such words? I added preg_quote to it and it still doesn't work.


$bads = array('+11'," - 68",'[img','$cool ', "# hash"); 
// disallowed full 'words';**some may start with a special character + space or end with a space**; if one of them appears in string, the function should return true

$s= 'This is +11 test to show if $cool or [img works but it does $cool not';
//another example to test: $s= 'This - 68 is # hash not';

if(contains($s,$bads)) {
echo 'Contains! ';
}

#### FUNCTION ###

function contains($str, $bads)
{
foreach($bads as $a) {
$a=preg_quote($a,'/');
if(preg_match("/\b".$a."\b/",$str)) return true;
}
return false;
}
Tomasz
  • 1,288
  • 18
  • 36
  • 1
    could it be because $ means variable? And that you need to escape it to make it string? See here, https://3v4l.org/S3MgL it returns an error on missing variable. But writing "\$cool" does not create the same error – Andreas Jun 27 '17 at 15:30
  • 1
    And what is the expected output of your function? As your code is it returns `Contains! ` Isn't that correct? – Andreas Jun 27 '17 at 15:31
  • I think it's not only about $ (it won't find +11 etc. from the array even if $cool is not present there.. still, it'd be best to find such $XXX words too, I thought preg_quote would 'sanitize' them somehow). Yes, it should just display 'Contains!' if TRUE, it's to show if it works or not. – Tomasz Jun 27 '17 at 15:32
  • 1
    just a tip: i suggest you write it as this : `"/\b" . $a . "\b/"` just to avoid confusion, because these characters mean something in regexes – Kaddath Jun 27 '17 at 15:34

3 Answers3

1

Intuition breaks down when applying word-boundary to a pattern that contains non-word characters. More on that here. What you seem to want, for this case, is \s:

function contains($str, $bads)
{
    $template = '/(\s+%1$s\s+|^\s*%1$s\s+|\s+%1$s\s*$|^\s*%1$s\s*$)/';
    foreach ($bads as $a) {
        $regex = sprintf($template, preg_quote($a, '/'));
        if (preg_match($regex, $str)) {
            return true;
        }
    }
    return false;
}

See it in action at 3v4l.org.

The regex checks for four different cases, each separated by |:

  1. One or more spaces, the bad pattern, then one or more spaces.
  2. Start of input, zero or more spaces, the bad pattern, then one or more spaces.
  3. One or more spaces, the bad pattern, zero or more spaces, then end of input.
  4. Start of input, zero or more spaces, the bad pattern, zero or more spaces, then end of input.

If you could guarantee that all of your bad patterns contained only word characters - [0-9A-Za-z_] - then \b would work just fine. Since that is not true here, you need to deploy a more explicit pattern.

bishop
  • 37,830
  • 11
  • 104
  • 139
  • In this function, where is the $matches variable coming from (not sure it I can make it work for now ;) – Tomasz Jun 27 '17 at 16:44
  • PHP provides that: $matches is a pass by reference out only variable. It was there for debugging purposes. I removed it from SO answer. – bishop Jun 27 '17 at 16:47
0

There are a few changes...

<?php
error_reporting ( E_ALL );
ini_set ( 'display_errors', 1 );
$bads = array("+11","- 68","[img",'$cool', "# hash"); 
// disallowed full 'words'; if one of them appears in string, 
// the function should return true

$s= 'This is +11 test to show if $cool or [img works but it does $cool not';
$s= 'This - 68 is # hash not';

if(contains($s,$bads)) {
    echo 'Contains! ';
}

#### FUNCTION ###

function contains($str, $bads)
{
    foreach($bads as $a) {
        $a=preg_quote($a,'\\');
        if(preg_match("/$a/",$str)) return true;
    }
    return false;
}

I've used single quotes round the $cool value and changed the preg_quote to use \ instead of /. Also removed the \b's from the preg_match - as some options are effectively multiple words.

Nigel Ren
  • 56,122
  • 11
  • 43
  • 55
  • Thanks - but I thinke \b MUST be there because now this string returns true (but it should not): $s= 'This [imgas 8 is hash not';// – Tomasz Jun 27 '17 at 15:43
  • The problem is that \b is classed as a word boundary, which doesn't class special characters as part of a word. So with [img (for example) the word boundary is at the 'i' and not the '['. So word boundaries will not work this way. – Nigel Ren Jun 27 '17 at 15:51
  • 1
    You could try `\s` which would match any whitespace instead of `\b`. – Nigel Ren Jun 27 '17 at 15:52
  • That's a good idea (even though then it won't catch a word that doesn't start or end with a space, but it's a better solution than mine.. – Tomasz Jun 27 '17 at 15:56
0

This is the best I can do.

https://3v4l.org/C8KqP

So build an string with the regex and if it starts with $ do not add \b.
I guess this has to be modified to fit your code but you can see the concept.
Also since I only do one regex with all the words it's much more efficient than checking one word at the time.

$bads = array('+11','- 68','[img','$cool', '# hash'); // disallowed full 'words'; if one of them appears in string, the function should return true

$s= 'This is test to show if or $cool works but it does not';
//another example to test: $s= 'This - 68 is # hash not';

if(contains($s,$bads)) {
echo 'Contains! ';
}

#### FUNCTION ###

function contains($str, $bads)
{
    $b = "/";
    foreach($bads as $a) {
        if(substr($a,0,1) == "$"){
            $b .= preg_quote($a,'/'). "|";
        }else{
            $b .= "\b" . preg_quote($a,'/'). "\b|";
        }
    }
    $b = substr($b, 0,-1) ."/";
    if(preg_match($b,$str, $m)){
        return true;    
    } 

    return false;
}
Andreas
  • 23,610
  • 6
  • 30
  • 62