preg_match and accented characters

Asked Nov 26 '17 at 16:54

Active Nov 26 '17 at 16:54

Viewed 75 times

I have a problem. The word "lyža" has 4 characters. When I use the "strlen" command it gives 5 characters. I can handle this by using "iconv_strlen". But if I want to use the "preg_match" command. So he finds another position. The problem is illustrated by the following code. In both cases there are dashes at 4 positions. The accented character makes a problem.

$line = 'lyza------';
preg_match('/[-]+/', $line, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);

$line2 = 'lyža------';
preg_match('/[-]+/', $line2, $matches2, PREG_OFFSET_CAPTURE);
print_r($matches2);

asked Nov 26 '17 at 16:54

Just count the number of chars in the match with `mb_strlen`. – Wiktor Stribiżew Nov 26 '17 at 16:59
`ž` is one character, but two bytes. And `PREG_OFFSET_CAPTURE` also counts in bytes. See also: [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/) – mario Nov 26 '17 at 17:01
I need a position – Nov 26 '17 at 17:01
1

You can use [`mb_strpos`](http://php.net/manual/en/function.mb-strpos.php) to get the location of the first dash. This will account for unicode characters. – h2ooooooo Nov 26 '17 at 17:19
What I wrote above is only a partial problem. I need to find the position, say: ([-]+[a-z]+[-]+) – Nov 26 '17 at 17:28

preg_match and accented characters

0 Answers0