1

There are many guides on how to download a web page, given its URL in java. In this case, the web page is downloaded from a server provided by the DNS (which returns the ip of one of the servers hosting the web page).

My question is - given a specific IP of a server, how can I download a web page hosted in the specific server using its URL ?

In order to clarify the question, consider a web site like google, which is hosted by multiple servers. If I were to download the web page using the 'www.google.com' URL alone, I would be provided with the web page from one of the hosting servers (which the DNS would select). However, assume I have the IP of one of the servers hosting www.google.com and I wish to download the web page specifically from that server - I would not be able to use the URL alone since I would not have any guarantees that I downloaded the web page from the correct server.

A complete answer to this question should also support HTTPS protocols.

Rann Lifshitz
  • 4,040
  • 4
  • 22
  • 42

1 Answers1

2

If the address is not SSL/TLS , sending something like host: www.google.co.jp in your request header should work, at least when the target http server is using name based virtual hosting. https://en.wikipedia.org/wiki/Virtual_hosting#Name-based

System.setProperty("sun.net.http.allowRestrictedHeaders", "true");
URL url = new URL("http://172.217.26.100/about/"); // one of the google IP
HttpURLConnection conn = (HttpURLConnection)url.openConnection();
conn.setRequestProperty("host","www.google.co.jp"); // get japanese google site, you may obtain canada site by changing to `www.google.ca`
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
System.out.println(reader.readLine());

Setting sun.net.http.allowRestrictedHeaders is required for java security reason: Can I override the Host header where using java's HttpUrlConnection class?

Community
  • 1
  • 1
ymonad
  • 11,710
  • 1
  • 38
  • 49
  • I used google as a simple example. I'm not sure if this approach would work with more complex URLs (which include paths, etc..). – Rann Lifshitz Aug 08 '16 at 08:25
  • 1
    @RannLifshitz I updated the question with path. I cannot assure you that the code works in *all* of the site, but name based virtual hosting is widely used, so it should work in most of the sites. please post another question if you find a site which you cannot use ip address request with `host` header. – ymonad Aug 08 '16 at 08:53
  • How would you suggest handling https based URLs (when trying to get them based on the server IP) ? – Rann Lifshitz Aug 12 '16 at 15:24