I just recently a inconsistent Jsoup behavior when it comes to the tbody tags, When I'm parsing a distant page on the web with a Html structure like:
<table>
<tbody>
<tr><td>... text
</tbody>
</table>
Jsoup does not include the tbody element in the elements returned by the select method().
I use the method connect().get() to load the remote page in a Document variable like:
Document doc = Jsoup.connect(url).get();
String expr = "table>tr>td";
String parsedTxt = doc.select(expr).text();
But when I parsed the same page on my local disk (after I downloading it). Jsoup includes the tbody tag. My expression will not work anymore because it's missing the tbody element.
I use:
File input = new File(locationOfFile);
Document doc = Jsoup.parse(input, "UTF-8", "");
My Jsoup expression works only in the first case.
Is there a way to force Jsoup to recognize the tbody element (or to remove it) so the same expression can used in both cases?
Is this a normal behavior from Jsoup?
Should I be using the connect method in parsing the local page as well?