Parse HTML table to Groovy list?

Question

I'd like to parse an HTML page and get the table values. For example parsing through it to get a list of dictionaries. Each list element would be a dictionary corresponding to a row in the table.

Let's say that the table is:

table

<table style="width:100%">
  <tr>
    <td>Jill</td>
    <td>Smith</td>      
    <td>50</td>
  </tr>
  <tr>
    <td>Eve</td>
    <td>Jackson</td>        
    <td>94</td>
  </tr>
</table>

result

[Jill,  Smith,  50]
[Eve,   Jackson,    94]

I'm achieving this by two ways:

Using Xpath :
```
page.body.div.table.tr.time;
```

Using closure like this:

page."**".findAll { it.@class.toString().contains("time")}.each {

Both ways use XMLSlurper:

@Grab(group='org.ccil.cowan.tagsoup', module='tagsoup', version='1.2')
def parser = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser())

So is there another way of getting table values using groovy

Thanks for the help!

Any issues with either of the above ways due to which a third approach is required? — dmahapatro, May 08 '16 at 15:51
Should something in your example html have a class of "time" — tim_yates, May 08 '16 at 18:19
1) The main concern of the first approach is the hardcoded solution. It's not agile. In case of changes of the structure then unexpected results could be returned. The second approach is my preferable way of doing it right now. Here the only problem is the computational cost and the need of regular expressions for some cases. I was searching for general solution similar to : http://stackoverflow.com/questions/6325216/parse-html-table-to-python-list — DataScientYst, May 09 '16 at 04:22

score 2 · Accepted Answer · edited May 23 '17 at 12:23

2

I have had good results using the jsoup HTML parser. It's a java library but works well with Groovy. Here's an example of parsing a table in Java, and a helpful blog entry on scraping using Groovy and jsoup. This question has an answer with a groovy example on parsing a table.

edited May 23 '17 at 12:23

Community

1
1

answered May 09 '16 at 10:33

Nicholas

15,916
4
42
66

And this is the working example that I've found: http://stackoverflow.com/questions/5396098/how-to-parse-a-table-from-html-using-jsoup. There is a groovy version as well. Thank you. – DataScientYst May 09 '16 at 13:51

Parse HTML table to Groovy list?

1 Answers1