8

I am trying to navigate to description page of California website http://kepler.sos.ca.gov/. but unable to go .

Then,I have a html form, on which I am submitting request, I am unable to add form here but its simple a POST request to http://kepler.sos.ca.gov/ with required params

I am able to get __EVENTTARGET and __EVENTARGUMENT from previous page from which I came here.

What am I doing wrong?

code:

String url = "kepler.sos.ca.gov/";
Connection.Response resp = Jsoup.connect(url)
                                .timeout(30000)
                                .method(Connection.Method.GET) 
                                .execute();
Document responseDocument = resp.parse();
Map<String, String> loginCookies = resp.cookies();
   eventValidation=responseDocument.select("input[name=__EVENTVALIDATION]").first();
viewState = responseDocument.select("input[name=__VIEWSTATE]").first();
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
graphics123
  • 1,191
  • 3
  • 20
  • 57
  • Also I am able to get __EVENTVALIDATION and __VIEWSTATE. – graphics123 Jul 02 '15 at 17:23
  • Please post your code. – TDG Jul 02 '15 at 17:26
  • I am unable to add codes, basically its a jsoup on jsp to get the required data, then a html form to send to the california website – graphics123 Jul 02 '15 at 17:37
  • String url = "http://kepler.sos.ca.gov/"; Connection.Response resp = Jsoup.connect(url).timeout(30000) .method(Connection.Method.GET) .execute(); Document responseDocument = resp.parse(); Map loginCookies = resp.cookies(); eventValidation = responseDocument.select("input[name=__EVENTVALIDATION]").first(); viewState = responseDocument.select("input[name=__VIEWSTATE]").first(); – graphics123 Jul 02 '15 at 17:38
  • This looks OK to me, where is the `POST` request? – TDG Jul 02 '15 at 17:46
  • Also the page is a jsp page, after getting values I am, rendering them to html form input values, then submitting the form using javascript – graphics123 Jul 02 '15 at 17:47
  • and on javascript document.getElementById("CALI").submit(); – graphics123 Jul 02 '15 at 17:48
  • Please put the info in the question and not the comments. We still need to see your post request as you try it with Jsoup. Remember that Jsoup is not a JavaScript interpreter. – luksch Jul 02 '15 at 18:16
  • I am using jsoup in jsp to get the __VIEWSTATE, __EVENTVALIDATION, __EVENTTARGET, __EVENTARGUMENT values which are later used in javascript – graphics123 Jul 02 '15 at 18:21

2 Answers2

21

You want to use FormElement. This is a useful feature of Jsoup. It is able to find the fields declared inside a form and post them for you. Before posting the form you can set the value of the fields using Jsoup API.

Nota:

In the sample codes below, you'll always see calls to the Element#select method followed by a call to Elements#first method.

For example : responseDocument.select("form#aspnetForm").first()

Jsoup 1.11.1 has introduced a more efficient alternative : Element#selectFirst. You can use it as a direct replacement of the original alternative.

For example:
responseDocument.select("form#aspnetForm").first()
can be replaced by
responseDocument.selectFirst("form#aspnetForm")

SAMPLE CODE

// * Connect to website
String url = "http://kepler.sos.ca.gov/";
Connection.Response resp = Jsoup.connect(url) //
                                .timeout(30000) //
                                .method(Connection.Method.GET) //
                                .execute();

// * Find the form
Document responseDocument = resp.parse();
Element potentialForm = responseDocument.select("form#aspnetForm").first();
checkElement("form element", potentialForm);
FormElement form = (FormElement) potentialForm;

// * Fill in the form and submit it
// ** Search Type
Element radioButtonListSearchType = form.select("[name$=RadioButtonList_SearchType]").first();
checkElement("search type radio button list", radioButtonListSearchType);
radioButtonListSearchType.attr("checked", "checked");

// ** Name search
Element textBoxNameSearch = form.select("[name$=TextBox_NameSearch]").first();
checkElement("name search text box", textBoxNameSearch);
textBoxNameSearch.val("cali");

// ** Submit the form
Document searchResults = form.submit().cookies(resp.cookies()).post();

// * Extract results (entity numbers in this sample code)
for (Element entityNumber : searchResults.select("table[id$=SearchResults_Corp] > tbody > tr > td:first-of-type:not(td[colspan=5])")) {
    System.out.println(entityNumber.text());
}

public static void checkElement(String name, Element elem) {
    if (elem == null) {
        throw new RuntimeException("Unable to find " + name);
    }
}

OUTPUT (as of this writing)

C3036475
C3027305
C3236514
C3027304
C3034012
C3035110
C3028330
C3035378
C3124793
C3734637

See also:

In this example, we will log into the GitHub website by using the FormElement class.

// # Constants used in this example
final String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"; 
final String LOGIN_FORM_URL = "https://github.com/login";
final String USERNAME = "yourUsername";  
final String PASSWORD = "yourPassword";  

// # Go to login page
Connection.Response loginFormResponse = Jsoup.connect(LOGIN_FORM_URL)
                                             .method(Connection.Method.GET)
                                             .userAgent(USER_AGENT)
                                             .execute();  

// # Fill the login form
// ## Find the form first...
FormElement loginForm = (FormElement)loginFormResponse.parse()
                                         .select("div#login > form").first();
checkElement("Login Form", loginForm);

// ## ... then "type" the username ...
Element loginField = loginForm.select("#login_field").first();
checkElement("Login Field", loginField);
loginField.val(USERNAME);

// ## ... and "type" the password
Element passwordField = loginForm.select("#password").first();
checkElement("Password Field", passwordField);
passwordField.val(PASSWORD);        


// # Now send the form for login
Connection.Response loginActionResponse = loginForm.submit()
         .cookies(loginFormResponse.cookies())
         .userAgent(USER_AGENT)  
         .execute();

System.out.println(loginActionResponse.parse().html());

public static void checkElement(String name, Element elem) {
    if (elem == null) {
        throw new RuntimeException("Unable to find " + name);
    }
}

All the form data is handled by the FormElement class for us (even the form method detection). A ready made Connection is built when invoking the FormElement#submit method. All we have to do is to complete this connection with addional headers (cookies, user-agent etc) and execute it.

Stephan
  • 41,764
  • 65
  • 238
  • 329
  • 1
    This the aboslute best example of JSoup I ever found. Thank you. – Luís Henriques Jun 26 '18 at 09:49
  • 1
    This answer is absolute help! For improvement I want to say that in my case I had to add the id and name of the button to get it clicked. I added the .data("id_button","name_button") to the last loginForm.submit() For me it was Connection.Response loginActionResponse = loginForm.submit() .data("id_button","name_button") .cookies(loginFormResponse.cookies()) .userAgent(USER_AGENT) .execute(); – Farmaker Jul 23 '18 at 20:21
  • See my answer below, which is the exact same code presented here, except that it reflects the changes California made to their website after the original answer was posted. – mbmast Apr 09 '20 at 23:45
0

This is the exact same code as posted above in the accepted answer, except that it reflects the changes California made to their website after the original answer was posted. So as of my writing this, this code works. I've updated original comments, identifying any changes.

// * Connect to website (Orignal url: http://kepler.sos.ca.gov/)
String url = "https://businesssearch.sos.ca.gov/";
Connection.Response resp = Jsoup.connect(url) //
                                .timeout(30000) //
                                .method(Connection.Method.GET) //
                                .execute();

// * Find the form (Original jsoup selector: from#aspnetForm)
Document responseDocument = resp.parse();
Element potentialForm = responseDocument.select("form#formSearch").first();
checkElement("form element", potentialForm);
FormElement form = (FormElement) potentialForm;

// * Fill in the form and submit it
// ** Search Type (Original jsoup selector: name$=RadioButtonList_SearchType)
Element radioButtonListSearchType = form.select("name$=SearchType]").first();
checkElement("search type radio button list", radioButtonListSearchType);
radioButtonListSearchType.attr("checked", "checked");

// ** Name search (Original jsoup selector: name$=TextBox_NameSearch)
Element textBoxNameSearch = form.select("[name$=SearchCriteria]").first();
checkElement("name search text box", textBoxNameSearch);
textBoxNameSearch.val("cali");

// ** Submit the form
Document searchResults = form.submit().cookies(resp.cookies()).post();

// * Extract results (entity numbers in this sample code, orignal jsoup selector: id$=SearchResults_Corp)
for (Element entityNumber : searchResults.select("table[id$=enitityTable] > tbody > tr > td:first-of-type:not(td[colspan=5])")) {
    System.out.println(entityNumber.text());
}
mbmast
  • 960
  • 11
  • 25
  • 1
    Can you please indicate more clearly the differences in your answer reflecting "the changes California made to their website" ? – Stephan Apr 10 '20 at 07:09