0

Why does my urlencode() produce something different than I expected?

This might be my expectations being wrong but then I would be even more puzzled.

example

urlencode("ä");

expectations = returns %C3%A4

reality = returns %E4

Where have I gone wrong in my expections? It seems to be linked to encoding. But I'm not very familiar in what I should do/use.

Should I change something on my server to that the function uses the right encoding?

halfer
  • 19,824
  • 17
  • 99
  • 186
marn
  • 65
  • 1
  • 7
  • 1
    Not sure: http://sandbox.onlinephpfunctions.com/code/2a9f0a4606f0fac02c0874b75a849fc5143f4f6e – AbraCadaver Mar 02 '15 at 16:46
  • Thank for that site. But it seems to confirm my expectations and push me deeper into confusion about my own php behaviour. Why would mine give back the wrong data? – marn Mar 02 '15 at 16:57
  • 1
    Maybe your file is not UTF-8 encoded? Not really sure. – AbraCadaver Mar 02 '15 at 16:59
  • 1
    According to the PHP manual (http://php.net/manual/en/function.urlencode.php), the output of urlencode is "a percent (%) sign followed by two hex digits". Looks like that's exactly what you are getting. And the value matches various encoding tables available online. –  Mar 02 '15 at 17:02

1 Answers1

2

urlencode encodes the raw bytes in your string into a percent-encoded representation. If you expect %C3%A4 that means you expect the UTF-8 byte representation of "ä". If you get %E4 that means your string is actually encoded in ISO-8859-1 instead.

Encode your string in UTF-8 to get the expected result. How to do this depends on where this string comes from. If it's a string literal in your source code file, save the file as UTF-8 in your text editor. If it comes from a database, see UTF-8 all the way through.

For more background information, see What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.

Community
  • 1
  • 1
deceze
  • 510,633
  • 85
  • 743
  • 889
  • Thank you very much, my file format was defaulted to ANSI. When i put it in utf-8 it solved my problems. – marn Mar 02 '15 at 17:16