Getting elements in order with REXML XPath

Question

I'd like to iterate through all the <HeadA> and <HeadB> elements in an XML file, and add a unique id to each. The approach I've tried so far is:

@xml.each_element('//HeadA | //HeadB') do |heading|
  #add a new id
end

The problem is, the nodeset from the XPath //HeadA | //HeadB is all the HeadAs followed by all the HeadBs. What I need is an ordered list of all the HeadAs and HeadBs in the order they appear in the document.

Just to clarify, my XML could look like this:

<Doc>
  <HeadA>First HeadA</HeadA>
  <HeadB>First HeadB</HeadB>
  <HeadA>Second HeadA</HeadA>
  <HeadB>Second HeadB</HeadB>
</Doc>

And what I'm getting from the XPath is:

  <HeadA>First HeadA</HeadA>
  <HeadA>Second HeadA</HeadA>
  <HeadB>First HeadB</HeadB>
  <HeadB>Second HeadB</HeadB>

when what I need to get is the nodes in order:

  <HeadA>First HeadA</HeadA>
  <HeadB>First HeadB</HeadB>
  <HeadA>Second HeadA</HeadA>
  <HeadB>Second HeadB</HeadB>

so I can add the ids sequentially.

Any compliant XPath engine must select the nodes in document order. Yours is obviously non-compliant. Strongly recommend *not* to use it and not to mistakenly believe that this is XPath. — Dimitre Novatchev, Nov 15 '10 at 21:00
@Dimitre: In fact, there is no specification enforcing the order of a resulting node set. This is hosting language responsability. You are right about that mostly every XPath engine will use the document order. — , Nov 16 '10 at 15:15

score 1 · Accepted Answer · edited Nov 16 '10 at 15:18

1

Ok, 2nd try, but I think I've got it this time :P

@xml.each_element('//*[self::HeadA or self::HeadB]') do |heading|
  puts heading.text
end

edited Nov 16 '10 at 15:18

answered Nov 15 '10 at 14:38

Doug

563
4
10

That did it! I've managed to turn my old 8 gnarly lines into 5 beautiful and clear lines. Thanks :) – Skilldrick Nov 15 '10 at 14:49

score 1 · Answer 2 · edited Nov 15 '10 at 16:01

1

Using Nokogiri to parse the XML:

xml = %q{
<Doc>
    <HeadA>First HeadA</HeadA>
    <HeadB>First HeadB</HeadB>
    <HeadA>Second HeadA</HeadA>
    <HeadB>Second HeadB</HeadB>
</Doc>
}

doc = Nokogiri::XML(xml)
doc.search('//HeadA | //HeadB').map{ |n| n.inner_text } #=> ["First HeadA", "First HeadB", "Second HeadA", "Second HeadB"]

For your task you could replace map with each or each_with_index and be almost done. Just add the code to insert the unique ID.

edited Nov 15 '10 at 16:01

Skilldrick

69,215
34
177
229

answered Nov 15 '10 at 15:33

the Tin Man

158,662
42
215
303

1

Thanks. I haven't used Nokogiri before, but that looks like a nice Rubyish kind of technique. – Skilldrick Nov 15 '10 at 16:02
Nokogiri rocks for XML and HTML parsing. What's especially cool about it is that you can actually use the simpler CSS accessors for a lot of XML lookups. – the Tin Man Nov 15 '10 at 18:22

score 0 · Answer 3 · answered Nov 15 '10 at 13:57

0

Would it work for you if you looped through all HeadA and, within each HeadA, loop through each HeadB?

@xml.each_element("//HeadA") do |headA|
  #do stuff to headA
  headA.each_element("HeadB") do |headB|
    #do stuff to headB
  end
end

answered Nov 15 '10 at 13:57

Doug

563
4
10

No, they're not nested. Thanks though. – Skilldrick Nov 15 '10 at 14:12

Skilldrick · Answer 4 · 2010-11-15T14:51:04.373

I've come up with a quick and dirty solution:

as_string = @xml.to_s
counter = 0
as_string.gsub!(/(<HeadA>|<HeadB>)/) do |str|
  result = str.sub '>', " id='#{counter}'>"
  counter += 1
  result
end
@xml = REXML::Document.new as_string

It's not the prettiest or most efficient probably, but it does what I wanted it to do.

Edit: Taking D-D-Doug's advice, I've now got this:

counter = 0
@xml.each_element('//[self::HeadA or self::HeadB]') do |heading|
  heading.attributes['id'] = "id%03d" % counter
  counter += 1
end

which is MUCH nicer.

Getting elements in order with REXML XPath

4 Answers4