3

I'm trying to go to the next page on an aspx form using JSoup.

I can find the next button itself. I just don't know what to do with it.

The idea is that, for that particular form, if the next button exists, we would simulate a click and go to the next page. But any other solution other than simulating a click would be fine, as long as we get to the next page.

I also need to update the results once we go to the next page.

// Connecting, entering the data and making the first request

...

// Submitting the form
Document searchResults = form.submit().cookies(resp.cookies()).post();

// reading the data. Everything up to this point works as expected

...

// finding the next button (this part also works as expected)
Element nextBtn = searchResults.getElementById("ctl00_MainContent_btnNext");

if (nextBtn != null) {
    // click? I don't know what to do here.
    searchResults  = ??? // updating the search results to include the results from the second page
}

The page itself is www.somePage.com/someForm.aspx, so I can't use the solution stated here:

Android jsoup, how to select item and go to next page

I was unable to find any other suggestions.

Any ideas? What am I missing? Is simulating a click even possible with JSoup? The documentation says nothing about it. But I'm sure people are able to navigate these type of forms.

Also, I'm working with Android, so I can't use HtmlUnit, as stated here:

importing HtmlUnit to Android project

Thank you.

Luís Henriques
  • 604
  • 1
  • 10
  • 30
  • observe the XHR requests. Check what happens if the "next" button is clicked. – Abhilash Jun 26 '18 at 17:30
  • refer this answer & find out what happens when the button is clicked. https://stackoverflow.com/a/4423097/8329042 – Abhilash Jun 26 '18 at 17:34
  • Thank you both. I was using that already. I'm just having a hard time isolating and understanding what happens when I click next. Maybe it's my inexperience with the inspector :p – Luís Henriques Jun 27 '18 at 09:21

2 Answers2

2

This is not Jsoup work! Jsoup is a parser with a nice DOM API that allows you to deal with wild HTML as if it were well-formed and not crippled with errors and nonsenses.

In your specific case you may be able to scrape the target site directly from your app by finding links and retrieving HTML pages recursively. Something like

private void scrape(String url) {
  Document doc = Jsoup.connect(url).get();
  // Analyze current document content here...
  // Then continue
  for (Element link : doc.select(".ctl00_MainContent_btnNext")) {
    scrape(link.attr("href"));
  }
}

But in the general case what you want to do requires far more functionality that Jsoup provides: a user agent capable of interpreting HTML, CSS and Javascript with a scriptable API that you can call from your app to simulate a click. For example Selenium:

WebDriver driver = new FirefoxDriver();
driver.findElement(By.name("next_page")).click();

Selenium can't be bundled in an Android app, so I suggest you put your Selenium code on a server and make it accessible with some REST API.

Raffaele
  • 20,627
  • 6
  • 47
  • 86
  • That is a good idea, but sadly, it's beyond what I am "allowed" to do for this project. Looks like I misunderstood the purpose of Jsoup. Gonna read the documentation again. Thank you. – Luís Henriques Jun 27 '18 at 09:17
  • Jsoup may be enough. It all depends on the specific target site: if it's static HTML Jsoup will handle It and I provided a stub for the scrape routine – Raffaele Jun 27 '18 at 09:28
  • Yes. I saw that. The problem is that I don't have an url. It's www.somePage.com/someForm.aspx, and when I click the "next" button, even though it changes page, the url remains exactly the same. I know it has to change something internally, but I can't seem to figure out what through the inspector. – Luís Henriques Jun 27 '18 at 09:34
  • The data is loaded with a XHR (JavaScript). 99% you won't do it without a full-blown user agent like Selenium. – Raffaele Jun 27 '18 at 09:37
1

Pagination on ASPX can be a pain. The best thing you can do is to use your browser to see the data parameters it sends to the server, then try to emulate this in code.

I've written a detailed tutorial on how to handle it here but it uses the univocity HTML parser (which is commercial closed source) instead of JSoup.

In short, you should try to get a <form> element with id="aspnetForm", and read the form elements to generate a POST request for the next page. The form data usually comes out with stuff such as this:

__EVENTTARGET = 
__EVENTARGUMENT = 
__VIEWSTATE = /wEPDwUKMTU0OTkzNjExNg8WBB4JU29ydE9yZ ... a very long string
__VIEWSTATEGENERATOR = 32423F7A
... and other gibberish

Then you need to look at each one of these and compare with what your browser sends. Sometimes you need to get values from other elements of the page to generate a similar POST request. You may have to REMOVE some of the parameters you get - again, make your code behave exactly the same as your browser

After some (frustrating) trial and error you will get it working. The server should return a pipe-delimited result, which you can break down and parse. Something like:

25081|updatePanel|ctl00_ContentPlaceHolder1_pnlgrdSearchResult|
<div>
    <div style="font-weight: bold;">
        ... more stuff
|__EVENTARGUMENT||343908|hiddenField|__VIEWSTATE|/wEPDwU... another very long string ...1Pni|8|hiddenField|__VIEWSTATEGENERATOR|32423F7A| other gibberish

From THAT sort of response you need to generate new POST requests for the subsequent pages, for example:

 String viewState = substringBetween(ajaxResponse, "__VIEWSTATE|", "|");

Then:

  request.setDataParameter("__VIEWSTATE", viewState);

There are will be more data parameters to get from each response. But a lot depends on the site you are targeting.

Hope this helps a little.

Jeronimo Backes
  • 6,141
  • 2
  • 25
  • 29