3

I'm doing implementation of web-crawler and in that, I'm using InetAddress class to get ip addresses from domain names. I tried domain name, en.wikipedia.org and got ip 208.80.154.224. Now i'm trying to fetch page /wiki/Cricket from that server using jSoup parser but getting error as below

Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=404, URL=http://208.80.154.224/wiki/Cricket
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:459)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:434)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:181)
    at OtherClasses.TestDownloadJSoup.main(TestDownloadJSoup.java:30)
Java Result: 1

My code of fetching page is

Connection con = Jsoup.connect("http://208.80.154.224/wiki/Cricket")
                        .userAgent("Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36")
                        .timeout(1000*5)
                        .followRedirects(true)
                        .referrer("http://www.google.com");

Wha should I do to resolve this 404 error and even I write this ip in browser it's giving domain not configured on this server error

dvhh
  • 4,724
  • 27
  • 33
Varun Raval
  • 460
  • 1
  • 4
  • 15

1 Answers1

1

Some server,can implement Virtual hosting, meaning that one server (one one IP address), can serve multiple domain name and decide which page to serve depending on the configuration.
You should add a Host header in your query

System.setProperty("sun.net.http.allowRestrictedHeaders", "true"); // this line is important to allow change in the Host header
Connection con = Jsoup.connect("http://208.80.154.224/wiki/Cricket")
                    .userAgent("Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36")
                    .timeout(1000*5)
                    .followRedirects(true)
                    .header("Host","en.wikipedia.org") // new entry here
                    .referrer("http://www.google.com");

see this answer for more informations

Community
  • 1
  • 1
dvhh
  • 4,724
  • 27
  • 33