1

For some reason this code will not let me into the website when I use the correct login information. The System.out.println posts the code of the login page, indicating my code did not work. Can someone tell me what I'm forgetting or what's wrong with it?

public void connect() {

    try {
        Connection.Response loginForm = Jsoup.connect("https://www.capitaliq.com/CIQDotNet/Login.aspx/login.php")
                .method(Connection.Method.GET)
                .execute();

        org.jsoup.nodes.Document document = Jsoup.connect("https://www.capitaliq.com/CIQDotNet/Login.aspx/authentication.php")
                .data("cookieexists", "false")
                .data("username", "myUsername")
                .data("password", "myPassword")
                .cookies(loginForm.cookies())
                .post();
        System.out.println(document);
    } catch (IOException ex) {
        Logger.getLogger(WebCrawler.class.getName()).log(Level.SEVERE, null, ex);
    }
}
Matt
  • 14,906
  • 27
  • 99
  • 149
Serpemes
  • 51
  • 1
  • 9

1 Answers1

1

Besides the username, password and the cookies, the site requeires two additional values for the login - VIEWSTATE and EVENTVALIDATION.
You can get them from the response of the first Get request, like this -

Document doc = loginForm.parse();
Element e = doc.select("input[id=__VIEWSTATE]").first();
String viewState = e.attr("value");
e = doc.select("input[id=__EVENTVALIDATION]").first();
String eventValidation = e.attr("value");

And add it after the password (the order doesn't really matter) -

org.jsoup.nodes.Document document = (org.jsoup.nodes.Document) Jsoup.connect("https://www.capitaliq.com/CIQDotNet/Login.aspx/authentication.php").userAgent("Mozilla/5.0")               
            .data("myLogin$myUsername", "MyUsername")
            .data("myLogin$myPassword, "MyPassword")
            .data("myLogin$myLoginButton.x", "22")                   
            .data("myLogin$myLoginButton.y", "8")
            .data("__VIEWSTATE", viewState)
            .data("__EVENTVALIDATION", eventValidation)
            .cookies(loginForm.cookies())
            .post();

I would also add the userAgent field to both requests - some sites test it and send different pages to different clients, so if you would like to get the same response as you get with your browser, add to the requests .userAgent("Mozilla/5.0") (or whatever browser you're using).

Edit
The userName's field name is myLogin$myUsername, the password is myLogin$myPassword and the Post request also contains data about the login button. Ican't test it, because I don't have user at that site, but I believe it will work. Hope this solves your problem.

EDIT 2
To enable the remember me field during login, add this line to the post request:

.data("myLogin$myEnableAutoLogin", "on")
TDG
  • 5,909
  • 3
  • 30
  • 51
  • Think I did exactly what you said but it still doesn't work. It's just giving me the code of the login page. I will post my code as an answer beneath. – Serpemes Aug 07 '15 at 14:54
  • if it's not working, it's not an answer - you should edit your question with the new code. I'll try to find another solution. – TDG Aug 07 '15 at 15:37
  • Thanks for answering my question again. Sadly it still doesn't work. I was thinking maybe the website has some sort of built in security against scrapers? – Serpemes Aug 08 '15 at 05:32
  • It can't be - a scraper imitates the browser, so the site can't tell between browser and scraper. Try to send the second request to this URL - `https://www.capitaliq.com/ciqdotnet/login.aspx?redirect=/CIQDotNet/Login.aspx/authentication.php`. – TDG Aug 08 '15 at 06:52
  • One more small question: how do I tick the "remember me" button? – Serpemes Aug 09 '15 at 06:01
  • I've editted my answer to include the 'remember me' option. – TDG Aug 09 '15 at 16:48
  • Thanks a lot, when I make a new page it seems to go back to the login page even after having ticked the remember me button. Do you have an idea of how I could remain logged in? – Serpemes Aug 10 '15 at 09:12
  • I basically just want to know how i can connect to a new page on the same website. – Serpemes Aug 10 '15 at 09:44
  • Do you alaways send back the cookies you're getting? – TDG Aug 10 '15 at 16:56
  • Not sure, I'll try again. Do I have to keep using the declared loginForm for that or do I make a new document? – Serpemes Aug 11 '15 at 02:13
  • "https://www.capitaliq.com/CIQDotNet/Filings/FilingsAnnualReports.aspx?CompanyId=18749&source=0", this is the link i need to go to after i've logged in – Serpemes Aug 11 '15 at 03:15