php preg_match get word with cyrillic characters

Question

I try to get some word from string, but this word maybe will have cyrillic characters, I try to get it, but all what I to do - not working.

Please help me; My code

  $str= "Продавец:В KrossАдын рассказать друзьям  var addthis_config = {'data_track_clickback':true};";
$pattern = '/\s(\w*|.*?)\s/';
preg_match($pattern, $str, $matches);
echo $matches[0];

I need to get KrossАдын.

Thaks!

possible duplicate of [UTF-8 in PHP regular expressions](http://stackoverflow.com/questions/6407983/utf-8-in-php-regular-expressions) — John M., Sep 05 '14 at 15:29

Casimir et Hippolyte · Answer 1 · 2014-09-05T16:46:48.017

You can change the meaning of \w by using the u modifier. With the u modifier, the string is read as an UTF8 string, and the \w character class is no more [a-zA-Z0-9_] but [\p{L}\p{N}_]:

$pattern = '/\s(\w*|.*?)\s/u';

Note that the alternation in the pattern is a non-sense:

you use an alternation where the second member can match the same thing than the first. (i.e. all that is matched by \w* can be matched by .*? because there is a whitespace on the right. The two subpatterns will match the characters between two whitespaces)

Writting $pattern = '/\s(.*?)\s/u'; does exactly the same, or better:

$pattern = '/\s(\S*)\s/u';

that avoids to use a lazy quantifier.

If your goal is only to match ASCII and cyrillic letters, the most efficient (because for character classes the smaller is the faster) will be:

$pattern = '~(*UTF8)[a-z\p{Cyrillic}]+~i';

(*UTF8) will inform the regex engine that the original string must be read as an UTF8 string.

\p{Cyrillic} is a character class that only contains cyrillic letters.

score 1 · Accepted Answer · edited May 23 '17 at 11:56

1

The issue is that your string uses UTF-8 characters, which \w will not match. Check this answer on StackOverflow for a solution: UTF-8 in PHP regular expressions

Essentially, you'll want to add the u modifier at the end of your expression, and use \p{L} instead of \w.

edited May 23 '17 at 11:56

Community

1
1

answered Sep 05 '14 at 15:29

John M.

2,234
4
24
36

php preg_match get word with cyrillic characters

2 Answers2