1

Using PHP 5.3.2, I'm having trouble with handling a request for a page whose name has an umlaut in it: ö

Making the request using Firefox + Live HTTP Headers for the test_ö_test.htm page, I can see firefox automatically converts/encodes the umlaut when it makes a request:

GET /test_%C3%B6_test.htm HTTP/1.1

Now, using http://meyerweb.com/eric/tools/dencoder/ I am able to encode/decode between test_ö_test.htm and test_%C3%B6_test.htm, so I figure that encoding is correct.

Using PHP's urldecode(), I get test_ö_test.htm

And the hated 404 is returned. Note that test_ö_test.htm does exist on the file system.

When I test with javascript's escape() I get test_%F6_test.htm. When I plug that into my browser, I get the content page returned successfully. urldecode() turns that back into the umlaut.

hakre
  • 193,403
  • 52
  • 435
  • 836
starmonkey
  • 3,147
  • 2
  • 20
  • 15
  • 1
    urldecode() is single-byte, while %C3%B6 seems multi-byte. Anyway, why not to encode your page names at the server side? Or, even better, not to use extended characters at all. – Your Common Sense Oct 27 '10 at 01:20
  • possible duplicate of [URL Decoding in PHP](http://stackoverflow.com/questions/1756862/url-decoding-in-php) – Nick Presta Oct 27 '10 at 01:21
  • Yes that is basically the same issue - happy to remove this if desired. – starmonkey Oct 27 '10 at 02:59

1 Answers1

3

Your page is declared as ISO-8859-1, while your data is UTF-8 encoded. This results in the browser trying to interpret the two byte UTF-8 sequence 0xc3 0xb6 as the two character Latin-1 sequence "LATIN CAPITAL LETTER A WITH TILDE" "PILCROW SIGN". Your data and the content encoding of the page need to agree.

TML
  • 12,813
  • 3
  • 38
  • 45