2
<tr valign="middle" align="center"> 
<td><b>someNumbers</b></td>
<td width="22" height="22" background="..." class="SomeIntrestingClass">xxxxx</td>
<td width="22" height="22" background="..." class="SomeIntrestingClass">xgdsx</td> 
<td width="22" height="22" background="..." class="SomeIntrestingClass">xyzzx</td>
<td width="22">&nbsp;</td></tr>

Im making an application that needs data from website. I need to extract the values in 'someNumbers' and the values in the td ex:'xyzzx'...
The problem I am having is 'someNumbers doesn't have a class so I tried to use
doc.getElementsByAttributeValue(key, value)
but the attribute there are the same in other parts of the document. How can I extract these values using JSoup or any other bright ideas? Thanks for any advice.

wtsang02
  • 18,603
  • 10
  • 49
  • 67
  • Can you select all the `td` and get only the text content? – nhahtdh Dec 22 '12 at 18:15
  • I can just select the td tag. But that will result 1k results and I'm just using 30% of that which 'someNumbers' will be very hard to distinguish. But ill try that. – wtsang02 Dec 22 '12 at 18:18

2 Answers2

0

Document.select(...); What this method does, we are able to use 'css selectors' like td.class or tr td #id and just use them as if they were css selectors in this article in Jsoup.

wtsang02
  • 18,603
  • 10
  • 49
  • 67
-1

<td[^<]+?>*</[^<]+?> use this as the regular expression and store it all in an array

then remove each one by removing <td[^<]+?> and then this </[^<]+?>.

Mike Demen
  • 71
  • 7
  • Please read [this](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – wtsang02 Dec 22 '12 at 18:35