1

I am trying to extract list of DIVs whose class = "child", and associate the "child" with a DIV whose class = "header" that occur before it.

For example:

<div class=header>HEADER A</div>
<div class=child>CHILD A.1</div>
<div class=child>CHILD A.2</div>
<div class=child>CHILD A.3</div>
<div class=header>HEADER B</div>
<div class=child>CHILD B.1</div>
<div class=child>CHILD B.2</div>
<div class=child>CHILD B.3</div>

I expect to have something like below

HEADER A --> CHILD A.1
HEADER A --> CHILD A.2
HEADER A --> CHILD A.3
HEADER B --> CHILD B.1
HEADER B --> CHILD B.2
HEADER B --> CHILD B.3
Phrogz
  • 296,393
  • 112
  • 651
  • 745
iwan
  • 7,269
  • 18
  • 48
  • 66
  • Note that if you convert your HTML to use semantic headers (e.g. `

    `) I have [an answer here](http://stackoverflow.com/questions/7827562/use-xpath-to-group-siblings-from-an-html-xml-document/7829248#7829248) that automatically groups headers and following siblings into sections.

    – Phrogz Oct 26 '11 at 13:34
  • …and you can easily switch to them via: `doc.css('div.header').each{ |head| head.name = 'h1' }` ;) – Phrogz Oct 27 '11 at 03:14

2 Answers2

2

Just store the previous header element:

header = ""
xml.xpath("//div").each{ |node|
  if node['class'] =~ /header/
    header = node.text
  else
    puts header + " --> " + node.text
  end
}
Tatu Lahtela
  • 4,514
  • 30
  • 29
  • thanks Tatu, i believed your suggestion work for simplified case as in question description. cheers – iwan Oct 25 '11 at 12:36
2

A more 'xpathy' version:

doc.xpath('//div[@class="child"]').each do |node|
    header = node.at('./preceding-sibling::div[@class="header"][1]')
    puts header.text + " --> " + node.text
end
pguardiario
  • 53,827
  • 19
  • 119
  • 159