Possible Duplicate:
How to parse consecutive tags with Nokogiri?
Can anybody throw me a line?
I'm using ruby and nokogiri to parse a document like this (fragment):
...
<dt>DOUE:</dt>
<dd>
<a href="http://ted.europa.eu">Accés al DOUE</a>
- 19/07/11
</dd>
<dt class="multi-linia">Criteris d'adjudicació:</dt>
<dd class="info-tabulada">
<strong>Ponderació:</strong>
50.00 -
<strong>Criteri:</strong>
oferta econòmica
</dd>
<dd class="info-tabulada">
<strong>Ponderació:</strong>
40.00 -
<strong>Criteri:</strong>
prestacions tècniques i funcionals
</dd>
<dd class="info-tabulada">
<strong>Ponderació:</strong>
10.00 -
<strong>Criteri:</strong>
altres elements
</dd>
<dt>another dt now</dt>
<dd>and its corresponding dd too</dd>
...
Normally i have alternate and consecutive dt and dd elements. In this cases is quite straightforward. But as in the example, this rule is broken sometimes with more than one dd element between dt elements.
To parse this list I have a var called area pointing to that list and I do this:
area.search("dt").each do |dt|
dd=dt.search("./following-sibling::dd[1]/text()")
puts "#{clear_string(dt.text)}: #{clear_string(dd.text)}"
end
where clear_string()
is a simple function that drops unnecessary space characters.
When parsing, I would like associate dt's text with the following dd's text up til the next dt. BTW, in the case of dd elements, I only want to keep its text, not its children's. How should I do that?