0

I have some Windows 7 machines on a local network, each with a hostname. One of them is running a Java 8 application that must access another machine via HTTP, so it needs to form a URI containing the server machine's hostname. The machine running the HTTP server has a hostname containing Japanese characters, which are not allowed in URIs.

How do I construct the URI to access the server machine on the local network? RFC 4501, "Domain Name System Uniform Resource Identifiers", says to encode according to RFC 3986, "Uniform Resource Identifier (URI): Generic Syntax", which I would infer to mean percent-encoding of the UTF-8 octets. But RFC 3490, "Internationalizing Domain Names in Applications (IDNA)", says to convert to Punycode.

So which is it, percent-encoding or Punycode? Which encoding will allow a Java application to successfully look up and connect to another Windows 7 machine, containing extended characters in the hostname, on a local network?

Garret Wilson
  • 18,219
  • 30
  • 144
  • 272

2 Answers2

1

DNS supports only ASCII in hostnames, so hostnames containing international characters must be registered in DNS, and encoded in URLs, using Internationalized Domain Name (IDN) format, which utilizes both Nameprep and Punycode when encoding non-ASCII characters.

Also look at RFC 3987 Internationalized Resource Identifiers (IRIs), which allows international characters to be (largely) unencoded, and defines algorithms for converting between IRIs and URIs.

Community
  • 1
  • 1
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Thanks for the response, Remy Lebeau, but you didn't provide an answer---you merely provided a link, and told me that hostnames must be encoded without telling me _how_ they must be encoded. – Garret Wilson Mar 28 '15 at 16:30
  • I told you how they are encoded in the URL - using IDN format, to match their DNS registration. – Remy Lebeau Mar 30 '15 at 21:08
0

The correct answer is "use Punycode for encoding the hostname when forming a URI". This is explained in the W3C article An Introduction to Multilingual Web Addresses, and specified in RFC 3987, "Internationalized Resource Identifiers (IRIs)".

Garret Wilson
  • 18,219
  • 30
  • 144
  • 272