10

I am working on a requirement to display (make readable) characters from the URL.

  • When I use Google Chrome, it displays the parameters in Chinese - even though they are encoded to UTF-8.

  • When I use Mozilla Firefox, it displays the parameters in Chinese - even though they are encoded to UTF-8.

  • When I use Internet Explorer, it displays the parameters encoded in UTF-8.

N.B. The URL is encoded to UTF-8; I know that because when I copy the URL from the three of them and paste it to Notepad++ the three of them display the following:

/%E6%89%93%E5%BC%80%E7%9B%AE%E5%BD%95/%E7%9B%B8%E6%9C%BA/%E6%95%B0%E7%A0%81%E7%9B%B8%E6%9C%BA/%E5%B0%8F%E5%9E%8B%E6%95%B0%E7%A0%81%E7%9B%B8%E6%9C%BA/PowerShot-A480/p/1934793

Could it be that Mozilla Firefox and Google Chrome guys have this improvement that can make an encoded String readable and perhaps the IE guys do not support that? Or, is there any way to activate that with IE?

By the way... Going to View >> Encoding >> Unicode (UTF-8) takes care of the text inside of the page but does not make any difference for the text in the URL.

Any help will be greatly appreciated!

MrWhite
  • 43,179
  • 8
  • 60
  • 84
user1532449
  • 322
  • 4
  • 14
  • What do you mean by "display characters from the URL"? Do you mean in the address bar? – chooban Jan 10 '13 at 08:11
  • Yes. When you pass parameters and you are using a GET parameters are visible (and therefore readable). Well, I am working with an application that sends parameters in Chinese and those parameters should be readable. – user1532449 Jan 10 '13 at 14:22
  • You have no control over what a browser decides to display within its own address bar. It sounds like Chrome and Firefox are simply decoding the URL (converting any `%HH` sequences into raw bytes, then assuming the converted URL is UTF-8 - which is not always the case on all websites - and decoding it into Unicode) whereas IE displays it as-is without any decoding. – Remy Lebeau Jan 11 '13 at 02:58
  • I'd never seen this before. As Remy says though, it's dangerous to assume it's UTF-8 rather than any other character encoding. I wonder if it works for other, non-UTF8 encodings. – chooban Jan 17 '13 at 20:05
  • Edge now supports unicode characters in the address bar. – Muhammad Rehan Saeed Jun 22 '16 at 07:37
  • @MuhammadRehanSaeed: Like I've stated in [my answer below](http://stackoverflow.com/a/19542940/177710), Internet Explorer was/is also able to display non-ASCII characters in its address bar - it just doesn't decode them it they are encoded. – Oliver Oct 13 '16 at 05:58

1 Answers1

12

I've written a blog post about Internet Explorer not displaying the decoded version of non-ASCII characters and using IRIs to solve the problem.

As of today, we have the following situation:

  1. HTML5 supports IRIs, i.e. URIs with Unicode character support
  2. HTTP does not support IRIs, but all major browsers take care of converting IRIs to valid (encoded) URIs to retrieve the specified resource (page).
  3. IE supports IRIs in the href attribute of anchor tags and properly displays them in its address bar just like when you enter your URL by hand (keyboard ;-)).
  4. If you choose to percent-encode your IRI thus making it a URI, IE will not decode that URI back into an IRI.

So you could try the following:

  1. Save your HTML files using UTF-8. This allows you to insert any Unicode character into it.
  2. Do not percent-encode your URLs inside your HTML pages' links. Just use links like this: <a href="http://zh.wikipedia.org/wiki/亦思巴奚兵乱">亦思巴奚兵乱</a>

A great article on the topic can also be found at the W3C: An Introduction to Multilingual Web Addresses.

Oliver
  • 9,239
  • 9
  • 69
  • 100