1

I want to implement a parallel DFS search in a tree. We have a (potentially huge) tree of business objects, and the user can search by a textId (or part of the textId). This is implemented in a serial DFS search at the moment. Which is a huge waste of resources. Our customers have 8 logical cores. No need to wait 5 minutes for a search result...

We already have a global ThreadManager. We use it mainly for parallel calculations of grid cells. The ThreadManager keeps track of how many tasks are queued, how many threads are available and starts the next queued task when a new thread is available.

This idea is to use this with a new task class to parallelize the tree search. Of course, I cannot start a new task on every childNode - that would mean hundreds or thousands of tasks queued. But only parallelize at a high tree level would underuse the cores when a task has only a small subtree.

I have the following idea for the task class:

One object of the task class knows its treeNode, and a common result object. The tasks "execute" method does the following:

if result object already has a result:
  return

match treeNode with the search condition. If match:
  put treeNodes object in the result object.
  return

for each childNode of treeNode:
  if the ThreadManager has a thread available:
    Create a new task with childNode and put the task on the queue
  else:
    call "execute" for every childNode

The synchronization of the completed tasks seems complicated. For example, I have to know when the tree is searched but the target textID is not found. But it should be possible to expand the ThreadManager to known if there are still tasks which belong to this search.

Does someone have experience with that kind of algorithm? Will the synchronization overhead be too much to be worth it? Are there other pitfalls I am not seeing?

This discussion is similar: Depth first search in parallel where "stack" or "global worklist" is my "ThreadManager". Did I get this right?

Thank you!

cepheus
  • 44
  • 7

0 Answers0