0

Requirement - Download file from website

Issue - The program issues error as below :

Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.nseindia.com/content/historical/EQUITIES/2015/FEB/cm25FEB2015bhav.csv.zip
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1840)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
at java.net.URL.openStream(URL.java:1045)
at org.apache.commons.io.FileUtils.copyURLToFile(FileUtils.java:1460)

The download works fine if I follow below manual steps:

  1. Open browser with link - weblink
  2. Select Bhavcopy from Select Report drop down field
  3. Specify date as 25-02-2015 Click on the file - cm25FEB2015bhav.csv.zip

But if I paste the file path - link directly in the browser, it gives 403 error.

I believe since the download does not work from the browser directly through the link, even my prog is unable to download the same. I tried the suggestions mentioned in the threads - Thread1, Thread2, Thread3 but did not help.

Query: Is there anyway to circumvent this blocking by the server through java code? As I need to download the files for several dates so manual clicking is not possible

Code:

public static void main(String[] args) throws IOException {
    String urlPath = "https://www.nseindia.com/content/historical/EQUITIES/2015/FEB/cm25FEB2015bhav.csv.zip";
    URL url = new URL(urlPath);
    // Option - 1:
    URLConnection conn = url.openConnection();
    conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
    conn.connect();
    // Option - 2:
//        HttpURLConnection connection = (HttpURLConnection) url.openConnection();
//        connection.addRequestProperty("REFERER", https://www.nseindia.com/products/content/equities/equities/archieve_eq.htm);
//        connection.setRequestMethod("GET");

        String zipBhavCopy = "C:\\zipBhavCopy.zip";

        FileUtils.copyURLToFile(new URL(urlPath), new File(zipBhavCopy));

    }
Community
  • 1
  • 1
iCoder
  • 1,406
  • 6
  • 16
  • 35

1 Answers1

0

This web site will check user agent and refer url in a http request.

So you need adding the refer url to your request.

The HTTP referer (originally a misspelling of referrer[1]) is an HTTP header field that identifies the address of the webpage (i.e. the URI or IRI) that linked to the resource being requested. By checking the referrer, the new webpage can see where the request originated.

Add this line to your code.

conn.setRequestProperty("Referer", "https://www.nseindia.com/products/content/equities/equities/archieve_eq.htm")

HTTP server checks this field to prevent from abusing.

  • It does not work. I have already tried this, refer to option-2 in the code pasted in my initial query – iCoder Feb 27 '17 at 04:57
  • I can download it with command `$ curl -v -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" -e "https://www.nseindia.com/products/content/equities/equities/archieve_eq.htm" https://www.nseindia.com/content/historical/EQUITIES/2015/FEB/cm25FEB2015bhav.csv.zip -o temp.zip` . It seems that user agent and refer are both required. – user2541463 Feb 27 '17 at 05:08
  • It may work with CURL command but it isn't working via the java program. Also I cannot download each file using console commands. – iCoder Feb 27 '17 at 05:36