I am trying to parse HTML using jsoup. This is my first time working with jsoup and I read some tutorial on it as well. Below is my HTML table which I am trying to parse -
If you see my below table, it has three tr
as of now (I have shorten it down to have three table rows just for understanding purpose but in general it will be more). Now I would like to extract Cluster Name
from my below table and it's corresponding host name
so for example - I would extract Titan
as cluster name and all its hostname whose status are down.
As you can see below for Titan
cluster name, I have two hostnames machineA.abc.com
and machineB.abc.com
in which machineA
status is up
but machineB
status is down
.
So I will print out Titan
as cluster name and print out machineB.abc.com
as the hostname since it is down. Is this possible to do using jsoup?
<table border=1>
<tr>
<td> </td>
<td> </td>
<td>Alert</td>
<td>Cluster Name</td>
<td>IP addr</td>
<td>Host Name</td>
<td>Type</td>
<td>Status</td>
<td>Free</td>
<td>Version</td>
<td>Restart Time</td>
<td>UpTime(Days)</td>
<td>Last probed</td>
<td>Last up</td>
</tr>
<tr bgcolor="ffffff">
<td><a href=showlog?ip_addr=127.0.0.1>Hist</a></td>
<td><a href=http://127.0.0.1:8080/test?full=y>VI</a></td>
<td bgcolor="ffffff"> </td>
<td>Titan</td>
<td>10.100.111.77</td>
<td>machineA.abc.com</td>
<td></td>
<td bgcolor="ffffff">up</td>
<td bgcolor="ffffff" align=right>88%</td>
<td bgcolor="ffffff">2.0.5-SNAPSHOT</td>
<td bgcolor="ffffff">2014-07-04 01:49:08,220</td>
<td bgcolor="ffffff" align=right>381</td>
<td>07-14 20:01:59</td>
<td>07-14 20:01:59</td>
</tr>
<tr bgcolor="ffffff">
<td><a href=showlog?ip_addr=127.0.0.1>Hist</a></td>
<td><a href=http://127.0.0.1:8080/test?full=y>VI</a></td>
<td bgcolor="ffffff"> </td>
<td></td>
<td>10.200.192.99</td>
<td>machineB.abc.com</td>
<td></td>
<td bgcolor="ffffff">down</td>
<td bgcolor="ffffff" align=right>85%</td>
<td bgcolor="ffffff">2.0.5-SNAPSHOT</td>
<td bgcolor="ffffff">2014-07-04 01:52:20,613</td>
<td bgcolor="ffffff" align=right>103</td>
<td>07-14 20:01:59</td>
<td>07-14 20:01:59</td>
</tr>
</table>
So far, I am able to extract whole HTML table using jsoup but not sure how would I extract cluster name and the hostnames which are down -
URL url = new URL("url_name");
Document doc = Jsoup.parse(url, 3000);
Update:-
I might have two cluster name in the table as shown below -
<table border=1>
<tr>
<td> </td>
<td> </td>
<td>Alert</td>
<td>Cluster Name</td>
<td>IP addr</td>
<td>Host Name</td>
<td>Type</td>
<td>Status</td>
<td>Free</td>
<td>Version</td>
<td>Restart Time</td>
<td>UpTime(Days)</td>
<td>Last probed</td>
<td>Last up</td>
</tr>
<tr bgcolor="ffffff">
<td><a href=showlog?ip_addr=127.0.0.1>Hist</a></td>
<td><a href=http://127.0.0.1:8080/test?full=y>VI</a></td>
<td bgcolor="ffffff"> </td>
<td>Titan</td>
<td>10.100.111.77</td>
<td>machineA.abc.com</td>
<td></td>
<td bgcolor="ffffff">up</td>
<td bgcolor="ffffff" align=right>88%</td>
<td bgcolor="ffffff">2.0.5-SNAPSHOT</td>
<td bgcolor="ffffff">2014-07-04 01:49:08,220</td>
<td bgcolor="ffffff" align=right>381</td>
<td>07-14 20:01:59</td>
<td>07-14 20:01:59</td>
</tr>
<tr bgcolor="ffffff">
<td><a href=showlog?ip_addr=127.0.0.1>Hist</a></td>
<td><a href=http://127.0.0.1:8080/test?full=y>VI</a></td>
<td bgcolor="ffffff"> </td>
<td></td>
<td>10.200.192.99</td>
<td>machineB.abc.com</td>
<td></td>
<td bgcolor="ffffff">down</td>
<td bgcolor="ffffff" align=right>85%</td>
<td bgcolor="ffffff">2.0.5-SNAPSHOT</td>
<td bgcolor="ffffff">2014-07-04 01:52:20,613</td>
<td bgcolor="ffffff" align=right>103</td>
<td>07-14 20:01:59</td>
<td>07-14 20:01:59</td>
</tr>
<tr bgcolor="ffffff">
<td><a href=showlog?ip_addr=127.0.0.1>Hist</a></td>
<td><a href=http://127.0.0.1:8080/test?full=y>VI</a></td>
<td bgcolor="ffffff"> </td>
<td>Goldy</td>
<td>10.100.111.77</td>
<td>machineH.pqr.com</td>
<td></td>
<td bgcolor="ffffff">up</td>
<td bgcolor="ffffff" align=right>88%</td>
<td bgcolor="ffffff">2.0.5-SNAPSHOT</td>
<td bgcolor="ffffff">2014-07-04 01:49:08,220</td>
<td bgcolor="ffffff" align=right>381</td>
<td>07-14 20:01:59</td>
<td>07-14 20:01:59</td>
</tr>
</table>
Now if you see above I have two cluster name - one is Titan
and other is Goldy
so I want to find all the machines which are down for Titan
cluster name only.