1

I would like to find documents on a path in a tree structure in MongoDB using $graphLookup.

For example, let's say I have this small structure:

0 - root
└── 01 "home"
     ├── 011 "john"
     │    ├── 0111 "documents"
     │    └── 0112 "movies"
     │         ├── 01121 "Holiday inn"
     │         └── 01122 "Harry Potter"
     └── 012 "maria"

representing the data:

[
  {"_id":"0","name":"root","parent":""},
  {"_id":"01","name":"home","parent":"0"},
  {"_id":"011","name":"john","parent":"01"},
  {"_id":"0111","name":"documents","parent":"011"},
  {"_id":"0112","name":"movies","parent":"011"},
  {"_id":"01121","name":"Holiday Inn","parent":"0112"},
  {"_id":"01122","name":"Harry Potter","parent":"0112"},
  {"_id":"012","name":"maria","parent":"01"}
]

Is it possible to use $graphLookup (potentially with other steps in the aggregation) to

  • check that a given path (e.g. "root/home/maria") exists?
  • to retrieve all documents on a target path (e.g. "root/home/john/movies/Harry Potter")?

In principle it should be possible to do it. But I'm not sure if $graphLookup is able to do it efficiently. That is, by checking that on each step in the recursion we're looking for a document with a specific name. The inefficient alternative would be to generate all paths of length N (starting with "root" folder) and find the matching one. Another inefficient options would be to store the path in the documents themselves, but that would incur a cost of many updates to child documents in case of renames. These are solutions I would very much like to avoid, for obvious reasons. But I'm unable to figure out whether an efficient query can be written for MongoDB.

Complications:

  • a folder may have potentially many children, so it may be easier to model the structure using parent references,
  • users may rename things, so it wouldn't be efficient to store the path in the documents themselves.
Bartosz
  • 41
  • 3
  • @turivishal thank you for a quick response! I've seen this question before. The user wanted to transform all the data into a different structure. So here one doesn't need much filtering per each level of recursion. Here, the problem is different: given a filesystem path, find the documents on this path. Performance is crucial as to not generate too many intermediate lists of documents that would then be discarded. I'm afraid the solution there would lead to huge degradation of performance with increasing number of documents. What do you think? – Bartosz Jul 22 '21 at 13:21
  • 1
    ohk, got it, i don't think is there any way to handle your situation. – turivishal Jul 22 '21 at 14:27

0 Answers0