4

I was studying tree algorithms and almost all algorithms use recursion for traversing of course traversal can be done without recursion as well (by creating stack data structure and while loop). But out of curiosity wanted to know how these tree data structures are traversed when there are millions or billions of nodes exist in tree ? of course these questions are asked in interviews as well.

Some of the approaches I can just think of are

  • Store tree in multiple files as different subtrees and traverse through files
  • Distribute tree across different machines
  • Store tree in database in table structure and design query for traversal

Any better approaches, if any one could share link to study material for such kind of problems will be help.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
Sandeep
  • 320
  • 2
  • 4
  • 18

1 Answers1

1

If the tree fits in memory, you can just walk it. I build tools that build ASTs with millions of nodes (both from lots of trees, and sometimes from incredibly deep trees); we store our trees in memory. A recursive walk works just fine. And, it only takes tens on nanoseconds per node (cache line miss time) to do it, if done right.

Fixed sized stacks usually screw this up because such stacks prevent arbitrarily deep recursion. See How does a stackless language work? The languages in which I code tree operations don't have fixed size stacks.

You can store the tree distributed across machines or (worse!) in database. You can still walk over that tree, but the algorithms are clumsier, and the extra delays in communication (to remote machines, to database tables) make this into such a slow operation, that hardly anybody does it.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • 1
    Thanks Ira for answering, What do you mean by non fixed size stacks, mostly when we do recursion , system stack is used and it is fixed ? may be I did not understand something ? – Sandeep Feb 21 '16 at 00:50
  • 1
    When we do recursion, we use *some* stack. If it defined as contiguous storage, usually it has preallocated size. Read the provided link. – Ira Baxter Feb 21 '16 at 01:35