1

I have table without any class or id (there are more tables on the page) with this structure:

<table cellpadding="2" cellspacing="2" width="100%">
...  
     <tr>
          <td class="cell_c">...</td>
          <td class="cell_c">...</td>
          <td class="cell_c">...</td>
          <td class="cell">SOME_ID</td>
          <td class="cell_c">...</td>
     </tr>
...
</table>

I want to get only one row, which contains <td class="cell">SOME_ID</td> and SOME_ID is an argument.


UPD. Currently i am doing iy in this way:

doc = Jsoup.connect("http://www.bank.gov.ua/control/uk/curmetal/detail/currency?period=daily").get();

      Elements rows = doc.select("table tr");
      Pattern p = Pattern.compile("^.*(USD|EUR|RUB).*$");

      for (Element trow : rows) {
          Matcher m = p.matcher(trow.text());   
          if(m.find()){                   
               System.out.println(m.group());              
          }
      }

But why i need Jsoup if most of work is done by regexp ? To download HTML ?

Vololodymyr
  • 1,996
  • 5
  • 26
  • 45

1 Answers1

1

If you have a generic HTML structure that always is the same, and you want a specific element which has no unique ID or identifier attribute that you can use, you can use the css selector syntax in Jsoup to specify where in the DOM-tree the element you are after is located.

Consider this HTML source:

<html>
 <head></head>
 <body>
  <table cellpadding="2" cellspacing="2" width="100%"> 
   <tbody>
    <tr> 
     <td class="cell">I don't want this one...</td> 
     <td class="cell">Neither do I want this one...</td> 
     <td class="cell">Still not the right one..</td> 
     <td class="cell">BINGO!</td> 
     <td class="cell">Nothing further...</td> 
    </tr> ... 
   </tbody>
  </table>
 </body>
</html>

We want to select and parse the text from the fourth <td> element. We specify that we want to select the <td> element that has the index 3 in the DOM-tree, by using td:eq(3). In the same way, we can select all <td> elements before index 3 by using td:lt(3). As you've probably figured out, this is equal and less than.

Without using first() you will get an Elements object, but we only want the first one so we specify that. We could use get(0) instead too.

So, the following code

Element e = doc.select("td:eq(3)").first();
System.out.println("Did I find it? " + e.text());

will output

Did I find it? BINGO!

Some good reading in the Jsoup cookbook!

Daniel B
  • 8,770
  • 5
  • 43
  • 76