Perform lookup on all dynamic reference fields if they exist

Question

I have a rather complicated situation that I want to solve. I have a very dynamic model that allows to reference other objects (in the same collection). The reference may be a single value or an array of references. Also, the names are not known in advance (in this example i named them ref1 and ref2 but they may be anything). What identifies the reference object is that a field with key referenceId exists. I actually did resolve that part (see further below). Sample data would look like this (I chose strings instead of ObjectIds for the sake of simplicity):

[
  {
    "id": "1",
    "name": "First"
  },
  {
    "id": "2",
    "name": "Second",
    "ref1": {
      "referenceId": "3"
    },
    "ref2": [
      {
        "referenceId": "1",
      },
      {
        "referenceId": "3"
      }
    ]
  },
  {
    "id": "3"
    "name": "Third",
  }
]

The desired result would be (after the lookups have been performed and the resulting objects replaced inline) - here for the object with id = 2:

[
  {
    "id": "2",
    "name": "Second",
    "ref1":   {
      "id": "1",
      "name": "First"
    },
    "ref2": [
      {
        "id": "1",
        "name": "First"
      },
      {
        "id": "3"
        "name": "Third",
      }
    ]
  }
]

After a lot of experimenting I came up with this idea for an aggregation pipeline which produces the following output:

[
  {
    "id": "2",
    "name": "Second",
    "ref1": {
      "id": "3",
      "name": "Third"
    },
    "ref2": [
      {
        "referenceId": "1"
      },
      {
        "referenceId": "3"
      }
    ]
  },
  {
    "id": "2",
    "name": "Second",
    "ref1": {
      "referenceId": "3"
    },
    "ref2": {
      "id": "1",
      "name": "First"
    }
  },
  {
    "id": "2",
    "name": "Second",
    "ref1": {
      "referenceId": "3"
    },
    "ref2": {
      "id": "3",
      "name": "Third"
    }
  }
]

This is pretty close to what I want, however, I am stuck at this stage with the following problems:

how can I remove the unwanted fields? Because of the $unwinding for the $lookup, I have duplicates of fields which I don't want in my final result (ref1 where ref2 was looked up and ref2 where ref1)
how do I group the result into one object by the dynamic (aka unpredictable) field names (in this case ref1 and ref2?)

Is this doable in MongoDB? Am I on the right path or is this something I should consider doing in client (as in backend) code? I would prefer to solve this in aggregation since client code will have a hit on performance.

Here are the steps I am taking up so far:

Mongo playground

Convert the result into an array via $objectToArray, and fetch only the keys I am interested in by checking for existence of the referenceId key. If present, return the [key, value] pair, null otherwise: (match for the 2nd object from data above)

  {
    $match: {
      id: "2"
    }
  }

  {
    $addFields: {
      "references": {
        $map: {
          input: {
            "$objectToArray": "$$ROOT"
          },
          as: "item",
          in: {
            $cond: {
              if: {
                $ne: [
                  "$$item.v.referenceId",
                  undefined
                ]
              },
              then: [
                "$$item.k",
                "$$item.v.referenceId"
              ],
              else: null
            }
          }
        }
      }
    }
  }

Filter out the null values for a subsequent map:

  {
    "$addFields": {
      "references": {
        $filter: {
          input: "$references",
          cond: {
            $ne: [
              "$$this",
              null
            ]
          }
        }
      }
    }
  }

For a lookup I need fixed field names, therefore convert the results into an array of {"k": , "v": "}

{
    "$addFields": {
      "references": {
        $map: {
          input: "$references",
          as: "item",
          in: {
            k: {
              "$arrayElemAt": [
                "$$item",
                0
              ]
            },
            v: {
              "$arrayElemAt": [
                "$$item",
                1
              ]
            }
          }
        }
      }
    }
  }

Unwind the results to have a v field to perform the lookup with:

  {
    $unwind: "$references"
  },
  {
    $unwind: "$references.v"
  }

Perform the lookup into a "resolved" field, unwind it, and convert it to an object with the name of (~ ref1, ref2 in my example above) and add it to the result document.

{
    $lookup: {
      "from": "collection",
      "localField": "references.v",
      "foreignField": "id",
      "as": "resolved"
    }
  },
  {
    $unwind: "$resolved"
  },
  {
    "$addFields": {
      "k": "$references.k",
      "v": "$resolved"
    }
  },

and finally remove the temporary fields

  {
    $project: {
      resolved: 0,
      references: 0,
    }
  },

Anyone up for the challenge or am I just trying too hard? ;-)

this is really bad database structure, i would suggest to change as that fulfil your result criteria, also it does not matter duplicate data in mongodb, just add `name` field with `referenceId` so you don't need lookup/join process.. — turivishal, May 19 '21 at 05:23
Hi, this is just an example with name being the only attribute for the sake of demonstration. The objects can have an arbitrary number of fields. It’s not about saving space, it’s about potential updates. I completely understand the idea of having data denormalized in mongo. However, I want to avoid performing updates on every write on every possible reference. This would mean that every time I update the “original” I would have to update all (potentially thousands) of “links” to it. I don’t think this is a good approach either, apart from eventual inconsistencies. — OGordo, May 19 '21 at 10:43
ok so you can make dynamic structure instead of fix key, like make array of reference `{ reference: [ { k: "ref1", v: "1" }, { k: "ref2", "2" } ] }` in key-value format in attribute pattern.. — turivishal, May 19 '21 at 11:05
when you are thinking about thousands of document to update every time, so your current structure will not survive in thousands of documents, because every time deconstruct array and just to find the what key is for reference one day it will take more execution time and memory... — turivishal, May 19 '21 at 11:12
Your suggested data structure is actually what I had designed at first, but considered it an "ugly" solution, since it requires client side transformation to create the data structure I want to have in the client (group the resulting array of references by key). This is what I wanted to avoid. I came here to find out whether the (pretty powerful) aggregation pipeline offers means to perform the transformation there. — OGordo, May 19 '21 at 11:21
The thousands of documents refer to the amount of times a document can be referenced. This means on every fetch the resulting transformation is not heavy, it just happens possibly a lot of times. Think of it as one post someone posted here on stackoverflow. And other posts may reference it. You don't want to include the data since the posts may change. And imagine the post is referenced a lot of times since it is currently a hot topic. The thing is it may be a post, an image, a blob, whatever you can think of. Hence the dynamic naming and a future proof approach for updates on functionality. — OGordo, May 19 '21 at 11:25

Perform lookup on all dynamic reference fields if they exist

0 Answers0