Most browsers, such as Firefox and Chrome, do Unicode normalization on URLs before requesting them. For example, when chrome or firefox want to open this link:
http://fa.wikipedia.org/wiki/سید_محمد_خاتمی
which contains persian Unicode characters, they automatically convert this string into:
http://fa.wikipedia.org/wiki/%D8%B3%DB%8C%D8%AF_%D9%85%D8%AD%D9%85%D8%AF_%D8%AE%D8%A7%D8%AA%D9%85%DB%8C
I want to modify the hyperlinks in my website in a way to prevent browsers from normalizing unicode characters, such that when a user clicks on a linke, its pure (original) URL is requested from the server.
Is there any trick for that? E.g. a small javascript code in the source page that links to such URLs.
UPDATE: When I request the url by a programming language, e.g. Java's HttpURLConnection
, it requests the original URL and do not use any normalization (except that I explicitly call UrlNormalizer.normalize(url)
). However, most browsers and Linux's GET command do the normalization.