15

I got a strange behavior of that method:

import java.net.URI

    URI url = new URI("https://pmi_artifacts_prod.s3.amazonaws.com");
    System.out.println(url.getHost()); /returns NULL
    URI url2 = new URI("https://s3.amazonaws.com");
    System.out.println(url2.getHost());  //returns s3.amazonaws.com

`

i want first url.getHost() to be pmi_artifacts_prod.s3.amazonaws.com, but it gives me NULL. Turned out that problem is with underscores in domain name, its a known bug, but still what can be done as I need to work with this host exactly?

proxy
  • 846
  • 2
  • 11
  • 24
  • There is a great article about this here https://blogs.wandisco.com/java-and-underscores-in-host-names/ in short, yes you can do it (sort of)... but you **really shouldn't**. – Doug Sep 26 '19 at 08:18

3 Answers3

9

The bug is not in Java but in naming the host, since an underscore is not a valid character in a hostname. Although widely used incorrectly, Java refuses to handle such hostnames.

https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_hostnames

A possible workaround:

public static void main(String...a) throws URISyntaxException, NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException {
    URI url = new URI("https://pmi_artifacts_prod.s3.amazonaws.com");
    System.out.println(url.getHost()); //NULL


    URI uriObj = new URI("https://pmi_artifacts_prod.s3.amazonaws.com");
    if (uriObj.getHost() == null) {
        final Field hostField = URI.class.getDeclaredField("host");
        hostField.setAccessible(true);
        hostField.set(uriObj, "pmi_artifacts_prod.s3.amazonaws.com");
    }
    System.out.println(uriObj.getHost()); //pmi_artifacts_prod.s3.amazonaws.com


    URI url2 = new URI("https://s3.amazonaws.com");
    System.out.println(url2.getHost());  //s3.amazonaws.com
}
törzsmókus
  • 1,799
  • 2
  • 21
  • 28
Vurtatoo
  • 346
  • 3
  • 8
  • 13
    "Be conservative in what you send out, be liberal in what you accept." If people are putting underscores in their hostnames, a library used worldwide should handle them, not fail on them. Simply not robust, and shockingly bad move for a language as prominent as Java. – Keith Tyler Feb 09 '17 at 18:12
  • 1
    I'm not terribly clear on how this is valid. For sure, I can verify that Java does indeed do this, but I don't understand how Java8 released in 2014 could have this restriction. Supposedly, [RFC2181](https://datatracker.ietf.org/doc/html/rfc2181#section-11) reversed this restriction and was proposed in 1997 (though I don't know when it was approved, to be fair). If 2181 reversed it, then why does Java still not allow it? – searchengine27 Sep 14 '21 at 15:34
  • 1
    `The DNS itself places only one restriction on the particular labels that can be used to identify resource records. That one restriction relates to the length of the label and the full name` – searchengine27 Sep 14 '21 at 15:36
2

Underscore support could be added right into URI by patching:

public static void main(String[] args) throws Exception {
    patchUriField(35184372088832L, "L_DASH");
    patchUriField(2147483648L, "H_DASH");
    
    URI s = URI.create("http://my_favorite_host:3892");
    // prints "my_favorite_host"
    System.out.println(s.getHost());
}

private static void patchUriField(Long maskValue, String fieldName)
        throws NoSuchMethodException, IllegalAccessException, InvocationTargetException, NoSuchFieldException {
        Field field = URI.class.getDeclaredField(fieldName);
        
        Field modifiers = Field.class.getDeclaredField("modifiers");
        modifiers.setAccessible(true);
        modifiers.setInt(field, field.getModifiers() & ~Modifier.FINAL);
        
        field.setAccessible(true);
        field.setLong(null, maskValue);
}
Nikita Koksharov
  • 10,283
  • 1
  • 62
  • 71
1

note that although

new URI("https://pmi_artifacts_prod.s3.amazonaws.com");

will not throw and the workaround provided by @Vurtatoo will work for this case, it cannot handle url such as https://a_b?c={1}

I also found out that

new URI("https://a_b?c={1}")

will throw but

new URI("https://a_b?c=1")

won't.

not sure why is that but my take-away is we should not make any assumptions on the implementation details of the Java URI class. If you have to use Java URI, it's probably better to fork the source code and make the changes you need.

Zidong
  • 121
  • 1
  • 3
  • 7