1

Since some days I read about Character-Encoding, I want to make all my Pages with UTF-8 for Compability. But I get stuck when I try to convert User-Input to UTF-8, this works on all Browsers, expect Internet-Explorer (like always).

I don't know whats wrong with my code, it seems fine to me.

  • I set the header with char encoding
  • I saved the file in UTF-8 (No BOM)

This happens only, if you try to access to the page via $_GET on the internet-Explorer myscript.php?c=äüöß When I write down specialchars on my site, they would displayed correct.

This is my Code:

// User Input
$_GET['c'] = "äüöß"; // Access URL ?c=äüöß
//--------
header("Content-Type: text/html; charset=utf-8");
mb_internal_encoding('UTF-8');

$_GET = userToUtf8($_GET);

function userToUtf8($string) {
    if(is_array($string)) {
        $tmp = array();
        foreach($string as $key => $value) {
            $tmp[$key] = userToUtf8($value);
        }
        return $tmp;
    }

    return userDataUtf8($string);
}

function userDataUtf8($string) {
    print("1: " . mb_detect_encoding($string) . "<br>"); // Shows: 1: UTF-8
    $string = mb_convert_encoding($string, 'UTF-8', mb_detect_encoding($string)); // Convert non UTF-8 String to UTF-8
    print("2: " . mb_detect_encoding($string) . "<br>"); // Shows: 2: ASCII
    $string = preg_replace('/[\xF0-\xF7].../s', '', $string);
    print("3: " . mb_detect_encoding($string) . "<br>"); // Shows: 3: ASCII

    return $string;
}
echo $_GET['c']; // Shows nothing
echo mb_detect_encoding($_GET['c']); // ASCII
echo "äöü+#"; // Shows "äöü+#"

The most confusing Part is, that it shows me, that's converted from UTF-8 to ASCII... Can someone tell me why it doesn't show me the specialchars correctly, whats wrong here? Or is this a Bug on the Internet-Explorer?

Edit: If I disable converting it says, it's all UTF-8 but the Characters won't show to me either... They are displayed like "????"....

Note: This happens ONLY in the Internet-Explorer!

Chad Nouis
  • 6,861
  • 1
  • 27
  • 28
Petschko
  • 168
  • 3
  • 16
  • What version of IE are you using ? The three characters that you retrieve via GET are the ones you need ton convert, right ? – Answers_Seeker Jul 15 '15 at 13:05
  • That's right. I use Version IE 11.0.9600. But I want that it works with all UTF-8 Chars like it does in other browsers – Petschko Jul 15 '15 at 13:08
  • Will it display correctly if you change the encoding of IE to utf8? – frz3993 Jul 15 '15 at 13:26
  • Is this the entire PHP file? For HTML content, you need to put it in a `` tag. For non-UTF8 browsers (IE), you need to specify the charset of the document in a ``. – light Jul 15 '15 at 13:28
  • @frz3993 IE say me, that is UTF-8, I can't change the encoding, because it's already set to UTF-8 (Rightclick -> Encoding -> "Checked" UTF-8)... @light its set by php but I tried it with adding this line (as the first php output) `echo "";` but its the same result – Petschko Jul 15 '15 at 13:35
  • 1
    It seems that it will work if the c is urlencoded `c=%C3%A4%C3%BC%C3%B6%C3%9F`. – frz3993 Jul 15 '15 at 14:00

2 Answers2

2

Although I prefer using urlencoded strings in address bar but for your case you can try to encode $_GET['c'] to utf8. Eg.

$_GET['c'] = utf8_encode($_GET['c']);
frz3993
  • 1,595
  • 11
  • 13
1

An approach to display the characters using IE 11.0.18 which worked:

  • Retrieve the Unicode of your character : example for 'ü' = 'U+00FC'

  • According to this post, convert it to utf8 entity

  • Decode it using utf8_decode before dumping

The line of code illustrating the example with the 'ü' character is :

var_dump(utf8_decode(html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", 'U+00FC'), ENT_NOQUOTES, 'UTF-8')));

To summarize: For displaying purposes, go from Unicode to UTF8 then decode it before displaying it.

Other resources: a post to retrieve characters' unicode

Community
  • 1
  • 1
Answers_Seeker
  • 468
  • 4
  • 11
  • `utf_decode()` makes an ISO-8859-1 string. It seems weird to me, to return it back to an ISO-8859-1 string. It think its the sence of UTF8 to avoid workaround like this, to display it, no matter where the user comes from. Is this the only solution? – Petschko Jul 15 '15 at 13:41
  • If I don't call utf8_decode, I get 'ü' which is less readable than the ISO string I get :S Was trying to help – Answers_Seeker Jul 15 '15 at 13:43
  • Yes thanks for that^^ But I hope, that there is a cleaner solution like this. Because if I use that ISO string why I make me the trouble to use UTF-8 at all, I hope you understand :) – Petschko Jul 15 '15 at 13:48