I'm using php 5.3 and I want to count the words of some text for validation reason. My problem is that the javascript functionality that I have for the validation text, returns different number of words according the php functionality.
Here is the php code:
//trim it
$text = strip_tags(html_entity_decode($text,ENT_QUOTES));
// replace numbers with X
$text = preg_replace('/\d/', 'X', $text);
// remove ./,/-/&
$text = str_replace(array('.',',','-','&'), '', $text);
// number of words
$count = str_word_count($text);
I noticed that with php 5.5, I get the right number of the words but with php 5.3 not. I searched about that and I found this link (http://grokbase.com/t/php/php-bugs/12c14e0y6q/php-bug-bug-63663-new-str-word-count-does-not-properly-handle-non-latin-characters) that explains about the bug that php 5.3 has regarding with the latin characters. I tried to solve it with this code:
// remove non-utf8 characters
$text = preg_replace('/[^(\x20-\x7F)]*/','', $text);
But I still didn't get right result. Basically, the number of the word was very close to the result and sometimes accurate but often I had issues.
I decided to create another php functionality to fix the bug. Here is the php code:
//trim it
$text = strip_tags(html_entity_decode($text,ENT_QUOTES));
// replace multiple (one ore more) line breaks with a single space
$text = preg_replace("/[\n]+/", " ", $text);
// replace multiple (one ore more) spaces with a separator string (@SEPARATOR@)
$text = preg_replace("/[\s]+/", "@SEPARATOR@", $text);
// explode the separator string (@SEPARATOR@) and get the array
$text_array = explode('@SEPARATOR@', $text);
// get the numbers of the array/words
$count = count($text_array);
// check if the last key of the array is empty and decrease the count by one
$last_key = end($text_array);
if (empty($last_key)) {
$count--;
}
The last code is working fine for me and I would like to ask two questions:
- What could I do in first situation about the str_word_count function?
- If my second solution is accurate or could I do something to improve it?