0

Visiting the following domain: https://obs.bürgerhaus.de

In the browser console, if I check document.location.href, I get the following returned:

> document.location.href
"https://obs.xn--brgerhaus-q9a.de/"

Why is this value different than the actual domain? Is this some type of url encoding or something? How do I get the original domain with the umlaut in it?

Daniel A. White
  • 187,200
  • 47
  • 362
  • 445
kspearrin
  • 10,238
  • 9
  • 53
  • 82
  • 1
    https://en.wikipedia.org/wiki/Internationalized_domain_name it is punycode encoded to keep URLs limited to ASCII as they were from the beginning - To parse it back, this library could help: https://github.com/bestiejs/punycode.js/ or look at this solution from SO: https://stackoverflow.com/a/301287/3977134 – r3dst0rm Apr 05 '19 at 12:47
  • @r3dst0rm Thanks. You can post the answer and I will accept. – kspearrin Apr 05 '19 at 13:08

1 Answers1

1

The Domain Name System, which performs a lookup service to translate user-friendly names into network addresses for locating Internet resources, is restricted in practice1 to the use of ASCII characters, a practical limitation that initially set the standard for acceptable domain names.

(see: https://en.wikipedia.org/wiki/Internationalized_domain_name)

As the article tells, domains we use everyday, are technically limited to ASCII characters, to support more characters unicode domains gets encoded into so called Punycode (see RFC: https://www.ietf.org/rfc/rfc3492.txt)

Visting a website with an umlaut (or similar) will force the browser to encode this. For example, http://öbb.at is transformed to http://xn--bb-eka.at. The transformed form is called ASCII Compatible Encoding (ACE) made up of the four character prefix ( xn-- ) and the punycode representation of Unicode characters. See more details here ...

To parse it back, you could look into:

Punycode JS on GitHub

Solution from some - StackOverflow

r3dst0rm
  • 1,876
  • 15
  • 21