5

I'm trying to parse HTML page with DOM parser and jsoup library. The problem that I'm facing is this:

On Web site there are two buttons which show two different tables. I need to parse the table which is shown when the second button is clicked. There are different attribute values set after clicking the second button.

When I do Jsoup.connect("example.com")

I get response like first button is selected and I don't need that data.

Is there a way to perform click on second button, and then start parsing and retrieving data from Web site?

Abra
  • 19,142
  • 7
  • 29
  • 41
Veljko
  • 1,893
  • 6
  • 28
  • 58

3 Answers3

5

Jsoup is just a parser, i.e. it can't handle events such as clicking on buttons. Have a look at browser automation tools (e.g. Selenium) to perform this kind of job.

sp00m
  • 47,968
  • 31
  • 142
  • 252
5

JSoup is a HTML parser and not a browser alternative. Take a look at Html Unit

HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.

mtk
  • 13,221
  • 16
  • 72
  • 112
4

JSoup can't control the web page, only parse the content. For manipulation and interaction, there are some tools. I recommend Geb, which uses a Groovy DSL with a JQuery like syntax, making it very fluent. It's also pretty easy to parse xml/html with it.

Will
  • 14,348
  • 1
  • 42
  • 44