4

Here is my code: (It work correctly for English language)

$str1 = "itt is a testt";
$str2 = "it is a testt";
$str3 = "itt is a test";
$str4 = "it is a test";

echo preg_match("[\b(?:it|test)\b]", $str1) ? 1 : 2; // output: 2 (do not match)
                                     $str2           // output: 1 (it matches)
                                     $str3           // output: 1 (it matches)
                                     $str4           // output: 1 (it matches)

But I don't know why, the above REGEX does not work correctly for Persian language: (it always returns 1)

$str1 = "دیوار";
$str2 = "دیوارر";

echo preg_match("/[\b(?:دیوار|خوب)\b]/u", $str1) ? 1 : 2; // output: 1
echo preg_match("/[\b(?:دیوار|خوب)\b]/u", $str2) ? 1 : 2; // output: 1 (it should be 2)

How can I fix it?

hjpotter92
  • 78,589
  • 36
  • 144
  • 183
Shafizadeh
  • 9,960
  • 12
  • 52
  • 89

2 Answers2

4

You've put your regex in a character class in "/[\b(?:دیوار|خوب)\b]/u", remove the [] from it:

"/\b(?:دیوار|خوب)\b/u"

You could replace the \b by an alternative:

"/(?:^|\s)(?:دیوار|خوب)(?:\s|$)/u"

You could also change the \s with a negative character class that lists the arabic letters. I don't know them but it's like: [^دیوارخوب]...

Toto
  • 89,455
  • 62
  • 89
  • 125
1

The \b inside a character class or inside a double quoted regex is a backspace character.

That is why the correct answer is: either use single-quoted regex declaration so as not to use double-escaping, or use double backslashes before b inside a double-quoted regex.

  • '/\b(?:دیوار|خوب)\b/u' or...
  • "/\\b(?:دیوار|خوب)\\b/u"

See this IDEONE demo:

echo preg_match('/\b(?:دیوار|خوب)\b/u', $str1) ? 1 : 2; // output: 1
echo preg_match('/\b(?:دیوار|خوب)\b/u', $str2) ? 1 : 2; // output: 1 (it should be 2)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Excuse me ..., I don't know English very well, I want to know *'workaround'* is the same with *'solution'* ? – Shafizadeh Nov 12 '15 at 15:14
  • 1
    "Workaround" means a temporary solution that does not work as expected 100% times or works slower, because a real solution is impossible or too difficult/time-consuming to create. Try with words inside commas, Toto's solution won't work. – Wiktor Stribiżew Nov 12 '15 at 15:20