2
$result = ဖန္တ

$result = preg_replace(
            "/([\p{L}\p{N}A-Za-z0-9@#\".]{1,}[\p{L}\p{N}A-Za-z0-9\.\_-]{0,})/u",
            "foo[('$0')]bar",
            $result);

print_r($result);

//RESULT: foo[('ဖန')]bar္foo[('တ')]bar 

See bar္foo in there? I don't know why I am seeing this junk character? How to removed it? But if I use hello world as the input string, then it's showing the expected result:

foo[('hello')]bar foo[('world')]bar
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
Priy Ranjan
  • 119
  • 1
  • 8

1 Answers1

0

It looks like the MYANMAR SIGN VIRAMA "combining mark" falls outside of the character class that you have written.

If you were to execute:

var_dump(preg_split('//u', $input, 0, PREG_SPLIT_NO_EMPTY));

You would see that the individual characters in the string are: (Demo)

array(4) {
  [0]=>
  string(3) "ဖ"
  [1]=>
  string(3) "န"
  [2]=>
  string(3) "္"
  [3]=>
  string(3) "တ"
}

If you just want to replace combining marks with spaces, then make a second pass over the string to remove them.

Code: (Demo)

$input = 'ဖန္တ';

echo preg_replace(
         ['/[\p{L}\p{N}@#".]+[\p{L}\p{N}._-]*/u', '/\p{M}/u'],
         ["foo[('$0')]bar", ' '],
         $input
     );

Output:

foo[('ဖန')]bar foo[('တ')]bar
mickmackusa
  • 43,625
  • 12
  • 83
  • 136