20

In java, I'd like to replace the Host part of an url with a new Host, where both the host and url are supplied as a string.

This should take into account the fact that the host could have a port in it, as defined in the RFC

So for example, given the following inputs

I should get the following output from a function that did this correctly

Does anyone know of any libraries or routines that do Host replacement in an url correctly?

EDIT: For my use case, I want my host replacement to match what a java servlet would respond with. I tried this out by running a local java web server, and then tested it using curl -H 'Host:superduper.com:80' 'http://localhost:8000/testurl' and having that endpoint simply return the url from request.getRequestURL().toString(), where request is a HttpServletRequest. It returned http://superduper.com/testurl, so it removed the default port for http, so that's what I'm striving for as well.

Brad Parks
  • 66,836
  • 64
  • 257
  • 336

6 Answers6

23

The Spring Framework provides the UriComponentsBuilder. You can use it like this:

import org.springframework.web.util.UriComponentsBuilder;

String initialUri = "http://localhost/me/out?it=5";
UriComponentsBuilder builder = UriComponentsBuilder.fromHttpUrl(initialUri);
String modifiedUri = builder.host("myserver").port("20000").toUriString();
System.out.println(modifiedUri);
// ==> http://myserver:20000/me/out?it=5

Here you need to provide hostname and port in separate calls to get right encoding.

Marc
  • 1,900
  • 14
  • 22
16

You were right to use java.net.URI. The host and port (and user/password, if they exist) are collectively known as the authority component of the URI:

public static String replaceHostInUrl(String originalURL,
                                      String newAuthority)
throws URISyntaxException {

    URI uri = new URI(originalURL);
    uri = new URI(uri.getScheme().toLowerCase(Locale.US), newAuthority,
        uri.getPath(), uri.getQuery(), uri.getFragment());

    return uri.toString();
}

(A URI’s scheme is required to be lowercase, so while the above code can be said not to perfectly preserve all of the original URL’s non-authority parts, an uppercase scheme was never actually legal in the first place. And, of course, it won’t affect the functionality of the URL connections.)

Note that some of your tests are in error. For instance:

assertEquals("https://super/me/out?it=5", replaceHostInUrl("https://www.test.com:4300/me/out?it=5","super:443")); 
assertEquals("http://super/me/out?it=5", replaceHostInUrl("http://www.test.com:4300/me/out?it=5","super:80")); 

Although https://super/me/out?it=5 is functionally identical to https://super:443/me/out?it=5 (since the default port for https is 443), if you specify an explicit port in a URI, then the URI has a port specified in its authority and that’s how it should stay.

Update:

If you want an explicit but unnecessary port number to be stripped, you can use URL.getDefaultPort() to check for it:

public static String replaceHostInUrl(String originalURL,
                                      String newAuthority)
throws URISyntaxException,
       MalformedURLException {

    URI uri = new URI(originalURL);
    uri = new URI(uri.getScheme().toLowerCase(Locale.US), newAuthority,
        uri.getPath(), uri.getQuery(), uri.getFragment());

    int port = uri.getPort();
    if (port > 0 && port == uri.toURL().getDefaultPort()) {
        uri = new URI(uri.getScheme(), uri.getUserInfo(),
            uri.getHost(), -1, uri.getPath(),
            uri.getQuery(), uri.getFragment());
    }

    return uri.toString();
}
Community
  • 1
  • 1
VGR
  • 40,506
  • 4
  • 48
  • 63
  • Hmmm... interesting.... I'll have to review this... thanks! and here's a [repl of your solution](https://repl.it/MXr6/2) as well, with the tests adjusted as you suggested. – Brad Parks Oct 10 '17 at 15:03
  • Hey! I tried this out by running a local java webserver, and then posting to it using `curl -H 'Host:superduper.com:80' 'http://localhost:8000/testurl'` and having that endpoint simply return the url from `request.getRequestURL().toString()`, where request is a `HttpServletRequest`. It returned `http://superduper.com/testurl`, so it removed the default port for http. So for my use case, I want my host replacement to match what a java servlet would respond with, which doesn't seem to match with this approach. Does that sound right to you? – Brad Parks Oct 11 '17 at 12:53
  • 1
    Updated answer with code that strips default port numbers. – VGR Oct 11 '17 at 13:25
  • 1
    Thanks... Here's an [updated repl](https://repl.it/MXr6/3) showing that this works with the original test cases too! – Brad Parks Oct 11 '17 at 13:41
  • 1
    The encoding information would lost if uri.getQuery() contains escaped characters – machinarium Jan 21 '19 at 07:59
  • 1
    @machinarium That's true, but it might be okay for most uses. As far as I can tell, it only seems to change the characters that didn't really need to be encoded. %20 survives the round-trip. But %33 will be changed into a 3. – Steve Onorato Apr 05 '19 at 09:02
4

I quickly tried using java.net.URI, javax.ws.rs.core.UriBuilder, and org.apache.http.client.utils.URIBuilder, and none of them seemed to get the idea of a host header possibly including a port, so they all needed some extra logic from what I could see to make it happen correctly, without the port being "doubled up" at times, and not replaced correctly at other times.

Since java.net.URL doesnt require any extra libs, I used it. I do know that if I was using URL.equals somewhere, that could be a problem as it does DNS lookups possibly, but I'm not so I think it's good, as this covers my use cases, as displayed by the pseudo unit test.

I put together this way of doing it, which you can test it out online here at repl.it !

import java.net.URL;
import java.net.MalformedURLException;

class Main 
{
  public static void main(String[] args) 
  {
    testReplaceHostInUrl();
  }

  public static void testReplaceHostInUrl()
  {
    assertEquals("http://myserver:20000/me/out?it=5", replaceHostInUrl("http://localhost/me/out?it=5","myserver:20000")); 
    assertEquals("http://myserver:20000/me/out?it=5", replaceHostInUrl("http://localhost:19000/me/out?it=5","myserver:20000")); 
    assertEquals("http://super/me/out?it=5", replaceHostInUrl("http://localhost:19000/me/out?it=5","super")); 
    assertEquals("http://super/me/out?it=5", replaceHostInUrl("http://www.test.com/me/out?it=5","super")); 
    assertEquals("https://myserver:20000/me/out?it=5", replaceHostInUrl("https://localhost/me/out?it=5","myserver:20000")); 
    assertEquals("https://myserver:20000/me/out?it=5", replaceHostInUrl("https://localhost:19000/me/out?it=5","myserver:20000")); 
    assertEquals("https://super/me/out?it=5", replaceHostInUrl("https://www.test.com/me/out?it=5","super")); 
    assertEquals("https://super/me/out?it=5", replaceHostInUrl("https://www.test.com:4300/me/out?it=5","super")); 
    assertEquals("https://super/me/out?it=5", replaceHostInUrl("https://www.test.com:4300/me/out?it=5","super:443")); 
    assertEquals("http://super/me/out?it=5", replaceHostInUrl("http://www.test.com:4300/me/out?it=5","super:80")); 
    assertEquals("http://super:8080/me/out?it=5", replaceHostInUrl("http://www.test.com:80/me/out?it=5","super:8080")); 
    assertEquals("http://super/me/out?it=5&test=5", replaceHostInUrl("http://www.test.com:80/me/out?it=5&test=5","super:80")); 
    assertEquals("https://super:80/me/out?it=5&test=5", replaceHostInUrl("https://www.test.com:80/me/out?it=5&test=5","super:80")); 
    assertEquals("https://super/me/out?it=5&test=5", replaceHostInUrl("https://www.test.com:80/me/out?it=5&test=5","super:443")); 
    assertEquals("http://super:443/me/out?it=5&test=5", replaceHostInUrl("http://www.test.com:443/me/out?it=5&test=5","super:443")); 
    assertEquals("http://super:443/me/out?it=5&test=5", replaceHostInUrl("HTTP://www.test.com:443/me/out?it=5&test=5","super:443")); 
    assertEquals("http://SUPERDUPER:443/ME/OUT?IT=5&TEST=5", replaceHostInUrl("HTTP://WWW.TEST.COM:443/ME/OUT?IT=5&TEST=5","SUPERDUPER:443")); 
    assertEquals("https://SUPERDUPER:23/ME/OUT?IT=5&TEST=5", replaceHostInUrl("HTTPS://WWW.TEST.COM:22/ME/OUT?IT=5&TEST=5","SUPERDUPER:23")); 
    assertEquals(null, replaceHostInUrl(null, null));
  }

  public static String replaceHostInUrl(String url, String newHost)
  {
    if (url == null || newHost == null)
    {
      return url;
    }

    try
    {
      URL originalURL = new URL(url);

      boolean hostHasPort = newHost.indexOf(":") != -1;
      int newPort = originalURL.getPort();
      if (hostHasPort)
      {
        URL hostURL = new URL("http://" + newHost);
        newHost = hostURL.getHost();
        newPort = hostURL.getPort();
      }
      else
      {
        newPort = -1;
      }

      // Use implicit port if it's a default port
      boolean isHttps = originalURL.getProtocol().equals("https");
      boolean useDefaultPort = (newPort == 443 && isHttps) || (newPort == 80 && !isHttps);
      newPort = useDefaultPort ? -1 : newPort;

      URL newURL = new URL(originalURL.getProtocol(), newHost, newPort, originalURL.getFile());
      String result = newURL.toString();

      return result;
    }
    catch (MalformedURLException e)
    {
      throw new RuntimeException("Couldnt replace host in url, originalUrl=" + url + ", newHost=" + newHost);
    }
  }

  public static void assertEquals(String expected, String actual)
  {
    if (expected == null && actual == null)
    {
      System.out.println("TEST PASSED, expected:" + expected + ", actual:" + actual);
      return;
    }
      
    if (! expected.equals(actual))
      throw new RuntimeException("Not equal! expected:" + expected + ", actual:" + actual);
      
    System.out.println("TEST PASSED, expected:" + expected + ", actual:" + actual);
  }
}
Brad Parks
  • 66,836
  • 64
  • 257
  • 336
  • 2
    I'm impressed that the answer and solution were posted at the *exact* same time :) – achAmháin Oct 10 '17 at 12:56
  • 2
    @pruntlar Creating a question to directly answer it yourself is encouraged as it helps others with similar problems ([SO: self-answer](https://stackoverflow.com/help/self-answer)). – Zabuzard Oct 10 '17 at 12:57
  • @pruntlar you should see that the one who asked and the one who answered are the same person – azro Oct 10 '17 at 12:57
  • 4
    Yeah I tend to do this if I searched for something, didn't find the answer, and want to document it somewhere in case I ever need it again. It's supported by StackOverflow directly, to help share knowledge and foster discussion. The reason I do this is because there are probably better answers out there than mine, and if there are, I'll switch to using them, but for the time being, this works for my use case. Thanks! – Brad Parks Oct 10 '17 at 12:58
  • 1
    I'd improve the question though, I find it a bit short and had you not posted an answer I'd be inclined to ask what you've tried and maybe even close it. One thing that would spring to my mind would be to use a regex to replace the host. There obviously are drawbacks/pitfalls with this but you could point out those requirements in the question. – Thomas Oct 10 '17 at 13:03
  • Good point @Thomas - I just did some improvements, and am open to better solutions provided by others - this is just what I came up with quick, so maybe a regex solution will show up ;-) – Brad Parks Oct 10 '17 at 13:07
  • @BradParks I don't think unit tests should be part of the answer – Willi Mentzel Oct 10 '17 at 13:14
  • 1
    @WilliMentzel - thanks, but I think they need to be there, as it helps verify edge cases, and makes it easy for others to test/compare their solutions in the online java repl too. – Brad Parks Oct 11 '17 at 16:15
  • There are a couple of issues with this solution. +1 for using just what's available in the standard library, though. I always hate seeing answers that amount to "just use this big 3rd-party library for this one handy function." Anyhow, this doesn't work with IPv6 literals. Maybe nobody cares about that, but this won't work universally without some changes. Also, there are specific checks for HTTP and HTTPS, but the URL class already knows the default ports for those protocols, so you could use `URL.getDefaultPort` and remove those hard-coded checks. – Christopher Schultz Jul 14 '21 at 15:27
2

I realize this is a pretty old question; but posting a simpler solution in case someone else needs it.

String newUrl = new URIBuilder(URI.create(originalURL)).setHost(newHost).build().toString();
Saad Nawaz
  • 221
  • 1
  • 9
  • 5
    When you rely on external dependencies to do something, it's appropriate to mention which one. I guess you're using `httpcomponents:httpclient`'s `URIBuilder`?! – Renato Oct 25 '19 at 06:48
1

I've added a method to do this in the RawHTTP library, so you can simply do this:

URI uri = RawHttp.replaceHost(oldUri, "new-host");

Added in this commit: https://github.com/renatoathaydes/rawhttp/commit/cbe439f2511f7afcb89b5a0338ed9348517b9163#diff-ff0fec3bc023897ae857b07cc3522366

Feeback welcome, will release it soon.

Renato
  • 12,940
  • 3
  • 54
  • 85
-1

Or using some regex magic:

public static String replaceHostInUrl(String url, String newHost) {
    if (url == null || newHost == null) {
        return null;
    }
    String s = url.replaceFirst("(?i)(?<=(https?)://)(www.)?\\w*(.com)?(:\\d*)?", newHost);
    if (s.contains("http://")) {
        s = s.replaceFirst(":80(?=/)", "");
    } else if (s.contains("https://")) {
        s = s.replaceFirst(":443(?=/)", "");
    }
    Matcher m = Pattern.compile("HTTPS?").matcher(s);
    if (m.find()) {
        s = s.replaceFirst(m.group(), m.group().toLowerCase());
    }
    return s;
}
Luciano van der Veekens
  • 6,307
  • 4
  • 26
  • 30
  • Nice! And [here's a repl for it](https://repl.it/MXr6/1) that shows it works just as good as [my answer](https://stackoverflow.com/a/46667343/26510) – Brad Parks Oct 10 '17 at 13:58