0

Using this code currently i am reading xml file and its working fine for my personal Ubuntu PC

   URL url = new URL("https://www.google.com/site-map-all.xml");
    InputStream inputFile = url.openStream();
    DocumentBuilderFactory dbFactory = 
    DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    Document doc = dBuilder.parse(inputFile);
    doc.getDocumentElement().normalize();

But when i run same code in Ubuntu server is shows error

java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.google.com/sitemap.xml at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)

Can anyone help me to find out the issue? Where is the problem in server?

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
Zakaria Shahed
  • 2,589
  • 6
  • 23
  • 52
  • 2
    Wikipedia [HTTP 403](https://en.wikipedia.org/wiki/HTTP_403): Error 403: "The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated." – user85421 Aug 09 '18 at 09:52
  • 1
    Take a look at https://stackoverflow.com/questions/12732422/adding-header-for-httpurlconnection and play around with request headers, maybe set the `User-Agent` header. –  Aug 09 '18 at 10:22

2 Answers2

2

I just need to set the user agent

    URL url = new URL("https://www.google.com/sitemap.xml");
    URLConnection urlc = url.openConnection();
    urlc.setRequestProperty("User-Agent", "Mozilla 5.0 (Windows; U; "
            + "Windows NT 5.1; en-US; rv:1.8.0.11) ");
    InputStream inputFile = urlc.getInputStream();
Zakaria Shahed
  • 2,589
  • 6
  • 23
  • 52
0

Adding some more information here in case it helps others.

Firstly, the basic technique given in other answers is correct: when you get an HTTP 403 error from a Java program (such as an XML parser) that is attempting to access an HTTP resource, but typing the same URI into your web browser is successful, then you may need to set up request headers that mislead the site into thinking that the request is coming from a browser.

One current example I've found where this is happening is the schema at https://www.musicxml.org/xsd/xml.xsd

If there's a single file you need, and you are invoking the parser for that file directly, then you can create an InputSource "by hand" and pass it to the XML parser

Assuming that what you are doing is parsing XML, then you can follow the code suggested by @zsbappa:

URLConnection connection = new URL(uriString).openConnection();
connection.setRequestProperty("User-Agent", 
   "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();
InputSource inputSource = new InputSource(connection.getInputStream());

But if you're reading the file via an XSLT processor such as Saxon, or if the file contains references to other files that the XML parser also needs to read (for example DTDs, external entities, or schema documents) then it's not quite so easy. What you need to do in such case is to configure an EntityResolver on the parser. It will typically look something like this:

xmlReader.setEntityResolver((publicId, systemId) -> {
  if (systemId.startsWith("http:")) {
    URLConnection connection = new URL(systemId).openConnection();
    connection.setRequestProperty("User-Agent", 
       "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
    connection.connect();
    return new InputSource(connection.getInputStream());
  } else {
    return null;
  }
});

If you're calling Saxon and Saxon is calling the XML parser, you can supply your EntityResolver to Saxon either as an option on the Transform command line (-er:classname) or as an option on the Saxon Configuration. For example:

transformerFactory.setAttribute(
  FeatureKeys.ENTITY_RESOLVER_CLASS, MyEntityResolver.class);
Michael Kay
  • 156,231
  • 11
  • 92
  • 164