0

Encoding a string with German umlauts like ä,ü,ö,ß with Javascript encodeURI() causes a weird bug after decoding it in PHP with rawurldecode(). Although the string seems to be correctly decoded it isn't. See below example screenshots from my IDE

enter image description here enter image description here enter image description here

Also the strlen() of the - with rawurldecode() - decoded string gives more characters than it really has!

Problems occur when I need to process the decoded string, for example if I want to replace the German characters ä,ü,ö with ae, ue and oe. This can be seen in the example provided here.

I have also made an PHP fiddle where this whole weirdness can be seen.

What I've tried so far: - utf8_decode - iconv - and also first two suggestions from here

Tudor Ravoiu
  • 2,130
  • 8
  • 35
  • 56

1 Answers1

0

This is a Unicode equivalence issue and it looks like your IDE doesnt handle multibyte strings very well.

In unicode you can represent Ü with either:

  • the single unicode codepoint (U+00DC) or %C3%9C in utf8
  • or use a capital U (U+0055) with a modifier (U+0308) or %55%CC%88 in utf8

Your GWT string uses the latter method called NFD while your one from PHP uses the first method called NFC. That's why your GWT string is 3 characters longer even though they are both valid encodings of logically identical unicode strings. Your problem is that they are not identical byte for byte in PHP.

More details about utf-8 normalisation.

If you want to do preg replacements on the strings you need to normalise them to the same form first. From your example I can see your IDE is using NFC since it's the PHP string that works. So I suggest normalising to NFC form in PHP (the default), then doing the preg_replace.

http://php.net/manual/en/normalizer.normalize.php

function cleanImageName($name)
{
    $name = Normalizer::normalize( $name, Normalizer::FORM_C );
        $clean = preg_replace(

Otherwise you have to do something like this which is based on this article.

Community
  • 1
  • 1
Phil
  • 1,996
  • 1
  • 19
  • 26