3

In this example I have the word así which ends in the the accented i character.

 $str = "A string containing the word así which should be changed to color purple";

  $prac[] = "/\basí\b/i";
  $prac2[] = "<span class='readword'  style='color:purple'>\$0 </span>";

 $str= preg_replace($prac,$prac2,$str);

 echo $str;

It does not change. But if I have a word that does not end or begin with an accented character it DOES change. For example:

 $str = "A string containing another word which should be changed to color 
  purple";
  $prac[] = "/\banother word\b/i";
  $prac2[] = "<span class='readword'  style='color:purple'>\$0 </span>";

 $str= preg_replace($prac,$prac2,$str);

 echo $str;
 ?>

If the accent is in the middle of the word it always works also. Also I tested the array itself and preg_replace itself with the word. There does not appear to be a problem with the word with either the array or preg_replace. It is only when I am using an array as a parameter in preg_replace.

Please help, can't find any information on this anywhere.

Thank you

2 Answers2

3

Apparently an accented character is considered itself as a word boundary by PHP, and the 3 conditions to match a word boundary \b are:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.

Source: https://www.regular-expressions.info/wordboundaries.html

So when you use /\basí\b/i to match así inside the string it wouldn't cause no one of the 3 cited conditions is met, the first and second are obvious because así is in the middle of the string, and the third says that to match \b in the string we need two characters where one is a word character and the other is not, here we have í and space which are both not word characters.

After all not sure if my understanding is correct too.

For a solution you may replace your reg exp by /\basí(\b|\s+)/i

Check also Regex word boundary issue when angle brackets are adjacent to the boundary

And http://php.net/manual/en/function.preg-replace.php#89471

Ermac
  • 1,181
  • 1
  • 8
  • 12
2

Use the unicode flag:

$str = "A string containing the word así which should be changed to color purple";
$prac[] = "/\basí\b/iu";
#             here __^
$prac2[] = "<span class='readword'  style='color:purple'>\$0 </span>";
$str= preg_replace($prac,$prac2,$str);
echo $str;

Result for given example:

A string containing the word <span class='readword'  style='color:purple'>así </span> which should be changed to color purple
Toto
  • 89,455
  • 62
  • 89
  • 125