0

So I've been trying to download a pdf from a url that is password protected. I can visit the webpage using Jsoup because this doesn't support PDF files (the URL is a link to a PDF file). How do I make sure I don't have to re-enter the username and password? I can't use URLConnection because that doesn't allow me to log into the website. Thanks for the help.

    System.out.println("opening connection");
    URL url = new URL("https://www.HIDDEN.com/ciqdotnet/login.aspx?redirect=%2fCIQDotNet%2fFilings%2fDocumentRedirector.axd%3fversionId%3d" + ID + "%26type%3dpdf%26forcedownload%3dfalse");
    InputStream in = url.openStream();
    FileOutputStream fos = new FileOutputStream("/Users/HIDDEN/Desktop/fullreport.pdf");

    System.out.println("reading file...");
    int length = -1;
    byte[] buffer = new byte[1024];// buffer for portion of data from
    // connection
    while ((length = in.read(buffer)) > -1) {
        fos.write(buffer, 0, length);
    }
    fos.close();
    in.close();
    System.out.println("file was downloaded");
    }
Serpemes
  • 51
  • 1
  • 9

1 Answers1

1

You need to add the credentials to the HTTP header of the URL connection.

If you're already logged in, you need to extract the cookie from the cookie store and send the cookie hash along with the request.

If all this sounds too complicated, use Apache HttpComponents. The framework has all kinds of support code to set up your request, add user/password credentials and/or handle cookies.

[EDIT] You can find sample code for Apache HttpClient (which uses HttpComponents) here: https://hc.apache.org/httpcomponents-client-ga/examples.html

HttpClient can do the "download" part of a web browser. In a nutshell, url.openStream() will send a GET request to the server.

You can find an example how to authenticate against a server here: https://hc.apache.org/httpcomponents-client-ga/httpclient/examples/org/apache/http/examples/client/ClientAuthentication.java

If you're already logged in, you will have a cookie. Use this code to pass the cookie to HttpClient: Apache HttpClient 4.0.3 - how do I set cookie with sessionID for POST request

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • I'm really sorry for asking this but I'm not experienced at all and was just wondering if you could tell me which part of Apache HttpComponents I would have to implement and some basic code as to how it's done. – Serpemes Aug 11 '15 at 15:06
  • Does HttpComponents support visiting links that point to PDF content? – Serpemes Aug 11 '15 at 15:14
  • I've added a couple of links and pointers that should get you started. Try it and when you get stuck, ask a new question. – Aaron Digulla Aug 12 '15 at 09:31