0

I'm new with Jsoup and not very expert with HTML. I'm trying to get some data from a table in a website.

If I inspect element (from Chrome) on the table I need, I get:

<table class=" table-main" id="tournamentTable!>
<colgroup>...</colgroup>
<tbody>...</tbody>
</table>

When I run the command

Elements e = doc.select("table.table-main");

and then printed the content of e, I noted that it takes another table of the page whose class is <table class="table-main top-event">.

Since in class name class=" table-main" there is a withespace, I made other attempts, like doc.select("table[class= table-main]"); but this returned me a 0-sized element.

I tried to give a look at HTML code and I noted that there is no table with class name " table-main". Could be this the reason?

Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • 1
    Space shouldn't be considered as part of class name. `Elements e = doc.select("table.table-main")` should return *all* tables which have class `table-main`, but that doesn't exclude case where table has also other classes like ``. If you want to pick table with only your class try `doc.select("table[class=table-main]");` (without space)
    – Pshemo Oct 17 '16 at 18:42
  • Have you inspected the source in chrome with disabled JavaScript? Always keep in mind that jsoup has no JavaScript support. State the actual url, otherwise we are just guessing around. – Frederic Klein Oct 17 '16 at 19:57
  • @Pshemo: It's one of the attempts I made before posting but it's not working – Lorenzo Dusty Costa Oct 18 '16 at 07:16
  • @FredericKlein I've just checked the problem is JavaScript. Disabling it, I cannot see the table I'm interested in. So, if Jsoup has no JavaScript support, can you suggest me an other library that support JavaScript? If it could be useful, this is the link http://www.oddsportal.com/tennis/argentina/atp-buenos-aires/results/ Thanks – Lorenzo Dusty Costa Oct 18 '16 at 07:18
  • For a Java-only approach have a look at HtmlUnit. The documentation section has a tutorial for using HtmlUnit in combination with jsoup: http://stackoverflow.com/documentation/jsoup/4632/parsing-javascript-generated-pages#t=201610180720238960961 If other tools/languages aren't a problem, I would give phantomJS a try to fetch the rendered html source, then parse the file with jsoup. – Frederic Klein Oct 18 '16 at 07:25
  • @FredericKlein Thanks a lot. I will have a look at them. – Lorenzo Dusty Costa Oct 18 '16 at 07:41

1 Answers1

0

Firstly, it looks as though the HTML is invalid id="tournamentTable! instead of id="tournamentTable". This could cause the JSOUP parser some difficulties.

Secondly, if you are attempting to select the tournament table cited in your example then I would recommend selecting by id rather than css class: doc.select("#tournamentTable").

aengus
  • 605
  • 6
  • 13