-1

I have a list of forbidden words. I have to check if one of those forbidden words is inside a given string. My current code working fine partially.

A match should be true only and only if:

  1. any of the words in the string is an exact match with any of the forbidden words, e.g.: the pool is cold.
  2. any of the words in the string starts with any of the forbidden words, e.g.: the poolside is yellow.

A match should be false otherwise, and that includes both of these cases which are not currently working fine:

  1. if any of the words in the string ends with any of the forbidden words, e.g.: the carpool lane is closed.
  2. if any of the words in the string contains any of the forbidden words, e.g.: the print spooler is not working.

Current code:

$forbidden = array('pool', 'cat', 'rain');

// example: no matching words at all
$string = 'hello and goodbye'; //should be FALSE - working fine

// example: pool
$string = 'the pool is cold'; //should be TRUE - working fine
$string = 'the poolside is yellow'; //should be TRUE - working fine
$string = 'the carpool lane is closed'; //should be FALSE - currently failing
$string = 'the print spooler is not working'; //should be FALSE - currently failing

// example: cat
$string = 'the cats are wasting my time'; //should be TRUE - working fine
$string = 'the cat is wasting my time'; //should be TRUE - working fine
$string = 'joe is using the bobcat right now'; //should be FALSE - currently failing

// match finder
if(preg_match('('.implode('|', $forbidden).')', $string)) {
    echo 'match!';
} else {
    echo 'no match...';
}

Relevant optimization note: the official $forbidden words array has over 350 items, and the average given $string will have around 25 words. So, it would be great if the solution stops the preg_match process as soon as it finds the first occurrence.

Andres SK
  • 10,779
  • 25
  • 90
  • 152
  • So all you want is `if(preg_match('/\b(?:'.implode('|', $forbidden).')/', $string)) {`? https://3v4l.org/gO5kR. See https://stackoverflow.com/a/14719293/3832970 – Wiktor Stribiżew Oct 18 '20 at 16:34
  • `"/\b(".implode("|",$forbidden).")/i"` -- the `\b` ensures the match only *starts* at a word, rather than being at the middle or end. It works for whole words too since they also "start" with the match. – Niet the Dark Absol Oct 18 '20 at 16:34

1 Answers1

1

The key is to use \b assertion for word-boundary:

<?php
$forbidden = ['pool', 'cat', 'rain'];

// Examples
$examples = [
    // pool:
    'the pool is cold', //should be TRUE - working fine
    'the poolside is yellow', //should be TRUE - working fine
    'the carpool lane is closed', //should be FALSE - currently failing
    'the print spooler is not working', //should be FALSE - currently failing

    // cat:
    'the cats are wasting my time', //should be TRUE - working fine
    'the cat is wasting my time', //should be TRUE - working fine
    'joe is using the bobcat right now', //should be FALSE - currently failing
];

$pattern = '/\b(' . implode ('|', $forbidden) . ')/i';

foreach ($examples as $example) {
    echo ((preg_match ($pattern, $example) ? 'TRUE' : 'FALSE') . ': ' . $example . "\n");
}

http://sandbox.onlinephpfunctions.com/code/f424e6c78d3b13905486f646667c8bc9d48eda3a

Alexander Mashin
  • 3,892
  • 1
  • 9
  • 15
  • Your answer worked perfectly. Sadly, my question was marked as a duplicate. – Andres SK Oct 18 '20 at 20:53
  • i know this was not on the original question, but I'm trying to match póol as well (without having to add póol, poól or póól variants to the forbidden words array). I thought that adding the `u` flag to the `preg_match` regex would be enough, but it wasn't the case. Any ideas on that? – Andres SK Oct 18 '20 at 22:54
  • 1
    The best solution that I can offer is to create a `function antispoof (string $forbidden): string`, in which replace any "o" in `$forbidden` with `[oóòōоο]\u00b4?\u0300?` (includes Cyrillic о and Greek ο; but should really be longer), and likewise for any letter. However, there can be ready anti-spoofing solutions for PHP. – Alexander Mashin Oct 19 '20 at 03:38
  • I appreciate your help – Andres SK Oct 19 '20 at 04:10