1

I am searching a way for parsing URL/URIs in Java without having to worry about Exceptions for common URLs. The Java integrated way I know are using java.net.URI respectively java.net.URL.

Each of these classes have major draw backs:

  • java.net.URL can not handle custom protocols/schemes and fails therefore with URLs like idontcare://bla.com/test
  • java.net.URI has no problems with custom protocols/schemes but it fails if for example the query part contains an "illegal character" (a special character that is not URL encoded). Therefor it failes for example in an URL like https://bla.example.org/css?family=Roboto:300|Roboto:300,400,500,700&lang=de

Is there a universal and non-resticted (does not throw errors for cases as presented above) way to parse URLs in Java?

JMax
  • 1,134
  • 1
  • 11
  • 20
  • see https://stackoverflow.com/questions/10786042/java-url-encoding-of-query-string-parameters – y_ug Feb 20 '20 at 17:34
  • 2
    @ y_ug That question is about the opposite direction - it is about building a correctly encoded URL, I have a `String` that contains an url and want just access it's components like host name and protocol/scheme. – JMax Feb 20 '20 at 18:58
  • I see. Then those who do not want extra dependencies (like `UriComponents` from Spring Framework) can write custom protocols/schemes handler(s) and/or implement `URLStreamHandlerFactory` and enable them via `java.protocol.handler.pkgs` property or register with `URL.setURLStreamHandlerFactory.` – y_ug Feb 21 '20 at 14:53

1 Answers1

2

I found UriComponents from Spring Framework capable of handling both URIs:

String uri = ...
UriComponents uriComponents = UriComponentsBuilder.fromUriString(uri).build();

You can check the code to see what it does.

cassiomolin
  • 124,154
  • 35
  • 280
  • 359
  • 1
    If you are on a non-Spring application the library `org.springframework:spring-web` and it's dependencies are a big bunch of code (~3.4MB at the moment for 5.2.3) just for parsing URIs. But the Regex pattern `URI_PATTERN` in the linked code and the `fromUriString(String)` method seems to be of great value. And it is under Apache 2 license :) – JMax Feb 20 '20 at 19:17