I'm trying to use one script for keyword density. Everything works except for foreign letters (be it swedish, Estonian, or anything else).
$file includes the text.
Here's where the problem comes in:
$testsource = explode(" ", $file); // This has no problems with non-english letters
FIRST WORD in array: "Mängi"
$source = preg_split("/[(\b\W+\b)]/", $file, 0, PREG_SPLIT_NO_EMPTY); // This removes the non-english letter sometimes and also a letter in front of it
FIRST WORD in array: "ngi"
In case of this specific word the problem seems to be the "ä" character (and in case of other words other non-english characters) as my current preg_split removes the "Mä" from the beginning of the word. Words with no special characters are ok.
Question: What can I add to the preg_split not to cause issues?