location.href on a domain with umlaut (ü) reports different domain

Question

Visiting the following domain: https://obs.bürgerhaus.de

In the browser console, if I check document.location.href, I get the following returned:

> document.location.href
"https://obs.xn--brgerhaus-q9a.de/"

Why is this value different than the actual domain? Is this some type of url encoding or something? How do I get the original domain with the umlaut in it?

https://en.wikipedia.org/wiki/Internationalized_domain_name it is punycode encoded to keep URLs limited to ASCII as they were from the beginning - To parse it back, this library could help: https://github.com/bestiejs/punycode.js/ or look at this solution from SO: https://stackoverflow.com/a/301287/3977134 — r3dst0rm, Apr 05 '19 at 12:47
@r3dst0rm Thanks. You can post the answer and I will accept. — kspearrin, Apr 05 '19 at 13:08

score 1 · Accepted Answer · answered Apr 05 '19 at 13:35

The Domain Name System, which performs a lookup service to translate user-friendly names into network addresses for locating Internet resources, is restricted in practice1 to the use of ASCII characters, a practical limitation that initially set the standard for acceptable domain names.

(see: https://en.wikipedia.org/wiki/Internationalized_domain_name)

As the article tells, domains we use everyday, are technically limited to ASCII characters, to support more characters unicode domains gets encoded into so called Punycode (see RFC: https://www.ietf.org/rfc/rfc3492.txt)

Visting a website with an umlaut (or similar) will force the browser to encode this. For example, http://öbb.at is transformed to http://xn--bb-eka.at. The transformed form is called ASCII Compatible Encoding (ACE) made up of the four character prefix ( xn-- ) and the punycode representation of Unicode characters. See more details here ...

To parse it back, you could look into:

Punycode JS on GitHub

Solution from some - StackOverflow

location.href on a domain with umlaut (ü) reports different domain

1 Answers1