1

On a dedicated server:

$_POST['kannada']='ಕನ್ನಡ';
rawurlencode($_POST['kannada']);

gives

%26%233221%3B%26%233240%3B%26%233277%3B%26%233240%3B%26%233233%3B

On my local server:

$_POST['kannada']='ಕನ್ನಡ';
rawurlencode($_POST['kannada'])

gives

%E0%B2%95%E0%B2%A8%E0%B3%8D%E0%B2%A8%E0%B2%A1

the expected result is which i am getting on local server. Why this different result? please tell me.

Ok. Below is the file i ran on different servers. you can check it.

<!DOCTYPE html>
<html lang="kn">
<head>
<meta charset="UTF-8" />
</head>
<body>
<form name="submit" method="post">
<input type="text" name="kannada">
<input type="submit" name="submit">
</form>
<?php
if(isset($_POST['submit']))
{
    echo $_POST['kannada']."<br/>";
    echo rawurlencode($_POST['kannada']);
}
?>
</body>
</html>
Krish Gowda
  • 193
  • 2
  • 17

2 Answers2

3
<?php

echo rawurldecode('%26%233221%3B%26%233240%3B%26%233277%3B%26%233240%3B%26%233233%3B') . PHP_EOL;
echo rawurldecode('%E0%B2%95%E0%B2%A8%E0%B3%8D%E0%B2%A8%E0%B2%A1');

... prints:

&#3221;&#3240;&#3277;&#3240;&#3233;
ಕನ್ನಡ

Your two strings are simply different even though, when rendered in HTML context, they look the same.


Edit #1: It's actually possible to obtain HTML entities within your POST variables but that's a browser feature: if the user types some characters that are not supported by the document encoding inside an HTML form, the browser prefers to generate HTML entities instead of sending or dropping unsupported characters. But you do you need a browser; it won't happen if you fill $_POST manually from PHP.


Edit #2: As I was suspecting, the code you posted wasn't the actual code. Your input strings are not the same but you didn't notice because you are manipulating the form data as HTML:

echo $_POST['kannada']."<br/>";

... thus your HTML entities are being rendered as HTML. You need to do this:

echo htmlspecialchars($_POST['kannada'])."<br/>";

The rest is what I already said. Your page is not being interpreted as UTF-8. Make sure that:

  1. Your editor is saving files as UTF-8
  2. The web server is sending a correct Content-Type header. You can force it from PHP:

    header('Content-Type: text/html; charset=utf-8');
    
Álvaro González
  • 142,137
  • 41
  • 261
  • 360
  • yes. i prefer the second one because it will be rendered properly in urls. – Krish Gowda Jan 23 '14 at 12:01
  • It's actually more than a matter of preference. If you accept HTML from untrusted sources you might be open to XSS attacks unless you implement specially crafted security measures. If you only accept plain text everything's easier. – Álvaro González Jan 23 '14 at 12:03
  • 1
    for now security is not an issue. it is a simple website. my client wants nice looking urls. – Krish Gowda Jan 23 '14 at 12:21
  • It's a good policy. That way you can charge the customer every time the site gets hacked. :) – Álvaro González Jan 23 '14 at 12:29
  • you are right . the editor was not saving the file properly. i did this. File Save as > checked the option include unicode signature > then uploaded the file to server and ran it. i got the second type of encoding which i used to get locally. – Krish Gowda Jan 23 '14 at 12:55
  • Unicode signature (BOM) is not mandatory for UTF-8 and it often just creates problems. Saving as UTF-8 without BOM (not ANSI or Unicode or anything else) should be enough. The most likely reason is an incorrect `Content-Type` header, something you can verify with your browser's developer tools. – Álvaro González Jan 23 '14 at 12:59
  • ok i will do it. it's just that my editor was not showing that encoding options. i saved using windows notepad. and THANK YOU soo much. I learned something new today. – Krish Gowda Jan 23 '14 at 13:03
1

First result is Unicode charset

mb_convert_encoding($unicode_content, 'UTF-8', 'HTML-ENTITIES');

http://phpfiddle.org/main/code/xkj-nyr

<?php
//UNICODE
$a = rawurldecode("%26%233221%3B%26%233240%3B%26%233277%3B%26%233240%3B%26%233233%3B");

//UTF-8
$b = rawurldecode("%E0%B2%95%E0%B2%A8%E0%B3%8D%E0%B2%A8%E0%B2%A1");

//Convert to utf-8
echo mb_convert_encoding($a, 'UTF-8', 'HTML-ENTITIES');
echo "\r\n";
echo $b;
Álvaro González
  • 142,137
  • 41
  • 261
  • 360
Parfait
  • 1,752
  • 1
  • 11
  • 12