multi-language str_word_count in php

Asked Jun 27 '16 at 20:49

Active Jun 27 '16 at 20:58

Viewed 246 times

If I use str_word_count() php function on russian text it will return invalid result. The work around is to use something like:

function my_word_count($str) {
    return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}

However , this function may not work with text on another language, complicating the task much more and probably you will have to write individual str_word_count() for every language out there. So, provided that the input may be ASCII or UTF8, does a generic multi-language function exist to count words on any language?

edited Jun 27 '16 at 20:58

asked Jun 27 '16 at 20:49

Nulik

6,748
10
60
129

How does you regex not work with acii? https://eval.in/596594 – chris85 Jun 27 '16 at 20:55
http://stackoverflow.com/a/19289518/1507679 This question and answer address a similar issue that might help solve your problem. The first answer even lists some of the other unicode characters that are considered spaces. – Ryan Jun 27 '16 at 20:55

multi-language str_word_count in php

0 Answers0