0

If I use str_word_count() php function on russian text it will return invalid result. The work around is to use something like:

function my_word_count($str) {
    return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}

However , this function may not work with text on another language, complicating the task much more and probably you will have to write individual str_word_count() for every language out there. So, provided that the input may be ASCII or UTF8, does a generic multi-language function exist to count words on any language?

Nulik
  • 6,748
  • 10
  • 60
  • 129
  • How does you regex not work with acii? https://eval.in/596594 – chris85 Jun 27 '16 at 20:55
  • http://stackoverflow.com/a/19289518/1507679 This question and answer address a similar issue that might help solve your problem. The first answer even lists some of the other unicode characters that are considered spaces. – Ryan Jun 27 '16 at 20:55

0 Answers0