I have a problem that I thought would be simple but it's turning out to be quite complex.
I have a long UTF-8 string that is a mix of Roman, Western-European, Japanese, and Korean characters and punctuation. Many are multibyte chars, but some (I think) are not.
I need to do 2 things:
- Make sure there are no duplicate chars (and output that new string, stripped of dupes).
- Randomly shuffle that new string.
(Sorry, I can't seem to get the code quoting to format right...)
function uniquechars($string) {
$l = mb_strlen($string);
$unique = array();
for($i = 0; $i < $l; $i++) {
$char = mb_substr($string, $i, 1);
if(!array_key_exists($char, $unique))
$unique[$char] = 0;
$unique[$char]++;
}
$uniquekeys = join('', array_keys($unique));
return $uniquekeys;
}
and:
function unicode_shuffle($string)
{
$len = mb_strlen($string);
$sploded = array();
while($len-- > 0) {
$sploded[] = mb_substr($string, $len, 1);
}
shuffle($sploded);
$shuffled = join('', $sploded);
return $shuffled;
}
Using those two functions, which someone very helpfully provided, I THOUGHT I was all set...except that curiously, it seems like the Unique string (no duplicates) and the Shuffled string do not contain the same number of characters. (I am highlighting these chars from my browser and then cutting-and-pasting into another application...one string is always a different length than the one above, but often it varies...it's not even the same number of chars getting truncated each time!).
I'm sorry I don't know enough about PHP nor about coding to sleuth this myself but what on earth is going wrong here? It seems like it should be easy to just shuffle a big long string, but apparently it's much harder than I thought. Is there maybe another, easier way to do this? Should I convert the string first into respective hex numbers and shuffle those, then convert back to UTF-8? Should I output to a file rather than the screen?
Anyone out there have suggestions? I'm sorry, I'm very new to this, so possibly I'm just doing something really dumb.