For an analogy, pretend we are a Lego company.
We maintain a table of Bricks
that can be used to build "designs" for Lego Sets
A design could be explained as a Tree
of Nodes
, where a Node
could be an Assembly of Bricks
or a single Brick
So for a Lego Parrot, a Tree
would look like:
Parrot
├─ Head
│ ├─ Eyes
│ │ ├─ Black Brick
│ │ ├─ Black Brick
│ ├─ Beak
│ │ ├─ Black Brick
├─ Body
│ ├─ Chest
│ │ ├─ Yellow Brick
│ │ ├─ Yellow Brick
│ ├─ Wings
│ │ ├─ Blue Brick
│ │ ├─ Blue Brick
│ │ ├─ Blue Brick
├─ Tail
│ ├─ Yellow Brick
│ ├─ Blue Brick
The reason we want to lay it out this way, is because there could for example be instructions associated with each Assembly of Bricks
Now the problem we are facing is that we want to store thousands of designs in a single database, and we are worried about query efficiency if you tried to query a single design/tree. You would first need to know all of the thousands of Root Nodes, then from there traversal of the tree seems incredibly expensive in a relational table with thousands of other trees in there
Our first attempt is to store these trees in a single Relational DB (Postgres) using the Adjacency List Model (nodes just hold a reference to their parents)
But it was brought to our attention that a non-relational database might be better here, because of the built-in nesting, it seems easier to query and traverse a tree.
So what would be the best way to persist these trees?
- We will have tens of thousands of trees
- We want to be able to quickly query, traverse, so that we can build UI to quickly visualize a full Tree
- Users might also want to just visualize a subtree as well.
- We still want the ability to persist millions of Trees in the future as well