4

What is the best way to build an XML tree in Ruby if you have an Array of string paths?


paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]
xml = 
<nodeA1>
    <nodeB1>
        <nodeC1>
            <nodeD1>
                <nodeE1/>
            </nodeD1>
        </nodeC1>
        <nodeC2/>
    </nodeB1>
    <nodeB2>
        <nodeC2/>
        <nodeC3/>
    </nodeB2>
</nodeA1>

My first thought is to to split the path string to an array, and compare its depth and content to the previous array, but then if I get to path "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1", when I go back down to "nodeA1/nodeB1/nodeC2", the [1] node is the common ancestor, but keeping track of that is messy, the way I've been doing it at least.

I would like to make it recursive also, so I could process each nest level in its own function, but haven't come to any semi-universal solution yet.

Any ideas or things you guys commonly do when you run into this problem?

Thanks! Lance

MartinStettner
  • 28,719
  • 15
  • 79
  • 106
Lance
  • 75,200
  • 93
  • 289
  • 503

3 Answers3

5

REXML is your friend! You're getting XPaths, so use 'em!

require 'rexml/document'

paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]

x = REXML::Document.new
x.elements << "xml"

paths.each do |p|
  steps = p.split(/\//)
  steps.each_index do |i|
    unless REXML::XPath.first(x,"/xml/" + steps[0..i]*"/")
      REXML::XPath.first(x,"/xml/" + steps[0...i]*"/").elements << steps[i]
    end
  end
end
puts x.to_s

Note that your example data has both nodeA1 and nodeA3 at the top level, so I started with a root called "xml" here. If the "3" was a typo, and nodeA1 was really your root (as your sample XML output suggests), you can delete the 'x.elements << "xml"' line and change all the "/xml/"s to just "/".

glenn mcdonald
  • 15,290
  • 3
  • 35
  • 40
  • Very cool! I've been using nokogiri lately because there's so many negative things about REXML, but this makes me want to give it a shot. Why do you choose REXML over hpricot or nokogiri, or could you use em all together? – Lance Oct 01 '09 at 03:22
  • Rexml ships with Ruby installation otherwise I believe you can use either hpricot or nokogiri – khelll Oct 01 '09 at 03:29
  • Right, I used REXML because you have it already, and it works fine for the task at hand. No judgment vs hpricot or nokogiri implied. – glenn mcdonald Oct 01 '09 at 12:16
4

This is very similar to this question. Here's a modified version based upon sris's answer:

paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]

tree = {}

paths.each do |path|
  current  = tree
  path.split("/").inject("") do |sub_path,dir|
    sub_path = File.join(sub_path, dir)
    current[sub_path] ||= {}
    current  = current[sub_path]
    sub_path
  end
end

def make_tree(prefix, node)
  tree = ""
  node.each_pair do |path, subtree| 
    tree += "#{prefix}<#{File.basename(path)}"
    if subtree.empty?
      tree += "/>\n"
    else
      tree += ">\n"
      tree += make_tree(prefix + "\t", subtree) unless subtree.empty?
      tree += "#{prefix}</#{File.basename(path)}>\n"
    end
  end
  tree
end

xml = make_tree "", tree
print xml

Edit:

Here is a modified version that builds an actual XML document using Nokogiri. I think it's actually easier to follow than the string version. I also removed the use of File, because you don't actually need it to meet your needs:

require 'nokogiri'

paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]

tree = {}

paths.each do |path|
  current  = tree
  path.split("/").each do |name|
    current[name] ||= {}
    current  = current[name]
  end
end

def make_tree(node, curr = nil, doc = Nokogiri::XML::Document.new)
  #You need a root node for the XML.  Feel free to rename it.
  curr ||= doc.root = Nokogiri::XML::Node.new('root', doc)
  node.each_pair do |name, subtree|
      child = curr << Nokogiri::XML::Node.new(name, doc)
      make_tree(subtree, child, doc) unless subtree.empty?
  end
  doc
end

xml = make_tree tree
print xml

Edit 2:

Yes, it is true that in Ruby 1.8 hashes aren't guaranteed to maintain insertion order. If that's an issue, there are ways to work around it. Here's a solution that retains order but doesn't bother with recursion and is much simpler for it:

require 'nokogiri'

paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]

doc = Nokogiri::XML::Document.new
doc.root = Nokogiri::XML::Node.new('root', doc)

paths.each do |path|
  curr = doc.root
  path.split("/").each do |name|
    curr = curr.xpath(name).first || curr << Nokogiri::XML::Node.new(name, doc)
  end
end

print doc
Community
  • 1
  • 1
Pesto
  • 23,810
  • 2
  • 71
  • 76
  • this looks interesting, I'll try that out. The only thing is that I would like to use the xml libraries instead of using strings. – Lance Oct 01 '09 at 03:24
  • 1
    I have modified it to create an actual Nokogiri XML document. Hope that helps. – Pesto Oct 01 '09 at 04:14
  • If you're doing this in Ruby 1.8x, creating the intermediate hash loses the order of the children below any given parent. So if order matters, this solution only works in 1.9. – glenn mcdonald Oct 01 '09 at 12:18
  • That's not a requirement of the question, but it's a fair point. – Pesto Oct 01 '09 at 13:29
  • That Edit 2 version is nice. Much faster than mine, even ignoring the REXML/Nokogiri difference. – glenn mcdonald Oct 01 '09 at 17:41
  • Edit 2 is super sick, man. I'm gonna try that out immediately. – Lance Oct 01 '09 at 19:19
1

Looks like another version of this question.

So you could just define a tree structure and create the nodes for each string in the list. Then write an output method which prints out the tree as xml.

If you want to go without defining a tree structure, you have to make sure that the list is sorted as in your example. Then loop over the list and compare each line with the previous one:

  • For all nodes in the previous line that are not part of the current one, write a closing tag (in reverse order)
  • For all nodes in the current line that are not part of the previous line, write an opening tag.

This solution cannot produce self-closing tags ("<nodeE1/>") since this requires comparing with the previous and the next line.

And this solution is not recursive, but I think that the problem isn't a recursive one neither... (or I just didn't understand exactly, why you wanted a recursive function)

Community
  • 1
  • 1
MartinStettner
  • 28,719
  • 15
  • 79
  • 106