0

I got this php script:

$str = "ú";
echo $str . ' -> ' . urlencode($str) . "\n" ;

Expected Result:

ú -> %FA

Reference: http://www.w3schools.com/tags/ref_urlencode.asp

Actual Result

ú -> %C3%BA
texai
  • 3,696
  • 6
  • 31
  • 41
  • What is your issue? You think the example is lying? – hakre Feb 15 '12 at 14:23
  • 2
    [Don't trust anything w3schools tells you](http://w3fools.com/) – DaveRandom Feb 15 '12 at 14:24
  • I expect thar urlencode('ú') returns %FA instead of %C3%BA – texai Feb 15 '12 at 14:25
  • What char set is that letter in? As well, [w3fools](http://w3fools.com) – Marc B Feb 15 '12 at 14:25
  • 1
    @texai: It does, but not for every `ú` - computers differ how any character is represented, depending on the charset encoding. For LATIN-1 it does (`%FA`), for UTF-8 it does not (`%C3%BA`). – hakre Feb 15 '12 at 14:27
  • @hakre: What is the correct way to make my example script returns my expected result ? – texai Feb 15 '12 at 14:30
  • As an aside, what encoding does Windows cmd.exe use such that the above script would give me `%A3` instead? – Wiseguy Feb 15 '12 at 14:33
  • @texai: That highly depends on the charset encoding you want to use for `$str`. I would say, if you don't have any specific needs, go with UTF-8. But also see: [Handling character encodings in HTML and CSS](http://www.w3.org/International/tutorials/tutorial-char-enc/) and [Character encodings for beginners](http://www.w3.org/International/questions/qa-what-is-encoding). – hakre Feb 15 '12 at 14:34
  • @Wiseguy: Please see [What encoding/code page is cmd.exe using](http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using). But to get `%A3` you need to change the charset encoding of the PHP file (not of the shell, or at least not at first). – hakre Feb 15 '12 at 14:36

2 Answers2

4

Try this:

urlencode(utf8_decode($str));

That should give you the expected result.

Jeremy Harris
  • 24,318
  • 13
  • 79
  • 133
2

You encode the ú as UTF-8 (check the encoding of your example code), so urlencode does correctly encode it as %C3%BA.

You were more or less referring to this:

$str = "\xFA"; # ú in LATIN-1
echo $str . ' -> ' . urlencode($str) . "\n" ;

Which gives you your expected result, regardless how you encode the php-code/-file:

ú -> %FA

Demo, that site is using UTF-8 to store the source-code. If you want the output displayed as LATIN-1, this additional example signals the browser the LATIN-1 charset:

header('Content-Type: text/html; charset=latin-1');
$str = "\xFA"; # ú in LATIN-1
echo $str . ' -> ' . urlencode($str) . "\n" ;
hakre
  • 193,403
  • 52
  • 435
  • 836