I have a project where I need to build and store large trees of data in Ruby. I am considering different approaches for serialization, deserialization and querying of trees, and I am wondering what would be the best way to go. My major constraints are read time, query efficiency and and cross-version/cross-platform compatibility. The most frequent operation is to retrieve sets of nodes based on a combination of id/value and/or feature(s).Trees can be up to 15-20 levels deep. Moving subtrees is an uncommon procedure, but should be possible without too much black magic. Rails integration is not a primary concern. The options I thought about, along with some issues I'm concerned about, are the following:
- Marshal the trees, and when needed load them into memory and query them in Ruby (inefficiency as tree grows, cross-version compatibility?)
- Same as above, but use YAML (more cross-version compatible, but less efficient?)
- Same as above, but use a custom XML parser (need to recreate objects from scratch each time the tree is loaded?)
- Serialize the trees to XML, store them in an XML database (e.g. Sedna) and use XPath to query the trees (no experience with this approach, not sure about efficiency?)
- Use adjacency lists to query trees stored in an schema-less database (inefficiency when counting descendants?)
- Use materialized paths (potential of overfilling the max string length for deep trees?)
- Use nested sets (complex SQL queries?)
- Use the array of ancestors approach? Seems interesting in terms of querying efficiency according to the MongoDB page, but I haven't been able to find any serious discussion of this algorithm.
Based on your experience, which approach would better fit with the constraints I have described? If I go for an XML database, are there ones that would be more suited for this project? Are there other approaches I have overlooked that would be more efficient? Thanks for your time.