5

I would like to combine the data in one collection using the IDs of the two arrays.

An example is shown below.

{
    "_id": ObjectId ("5976fd2eb0adec0a32fa9831"),
       "People": [
          {
            "_id": 1,      <--- ID
            "Name": "jane"
          },
          {
            "_id": 2,      <--- ID
            "Name": "Mark"
          }
       ],
       "Contents": [
          {
            "userID":  2,   <--- People ID
            "Text": "111"
          },
          {
            "userID":  1,   <--- People ID
            "Text": "Hi"
          }
       ]
}

I want to make the above document as below.

{
    "_id": ObjectId ("5976fd2eb0adec0a32fa9831"),
    "People": [
       {
          "_id": 1,
          "Name" : "Jane"
       },
       {
          "_id": 2,
          "Name": "Mark"
       }
    ],
    "Contents": [
       {
          "userID": 2,
          "Name": "Mark",    <-- Adding
          "Text": "111",

      },
       {
          "userID": 1,
          "Name": "Jane",    <-- Adding
          "Text": "Hi",

      }
    ]
}

I have tried various things like $lookup or $unwind of .aggregate() but I cannot get the result.

Community
  • 1
  • 1
at Studio aT
  • 91
  • 1
  • 11
  • Have you tried this links ?? I hope it would help , https://stackoverflow.com/questions/37086387/join-two-collections-on-mutiple-field-using-lookup and https://www.sitepoint.com/using-joins-in-mongodb-nosql-databases/ – Tehmina Aug 11 '17 at 02:24
  • 2
    @Tehmina There should be no need to "self-join" between items in the same document. It would be really inefficient and there actually are very effective ways of doing this, without needing to effectively "re-query" the collection. – Neil Lunn Aug 11 '17 at 02:41

1 Answers1

7

You want $map and $indexOfArray ideally:

db.collection.aggregate([
  { "$addFields": {
    "Contents": {
      "$map": {
        "input": "$Contents",
        "as": "c",
        "in": {
          "userID": "$$c.userID",
          "Name": {
            "$arrayElemAt": [
              "$People.Name",
              { "$indexOfArray": [ "$People._id", "$$c.userID" ] }
            ]
          },
          "Text": "$$c.Text"
        }
      }
    }
  }}
])

Which basically grabs the value from the other array via $arrayElemAt for the matching "index" returned by $indexOfArray.

If your MongoDB needs to fall back a version without that operator, then you could use $filter instead:

db.collection.aggregate([
  { "$addFields": {
    "Contents": {
      "$map": {
        "input": "$Contents",
        "as": "c",
        "in": {
          "userID": "$$c.userID",
          "Name": {
            "$arrayElemAt": [
              { "$map": {
                "input": { 
                  "$filter": {
                    "input": "$People",
                    "as": "p",
                    "cond": { "$eq": [ "$$p._id", "$$c.userID" ] }
                  }
                },
                "as": "p",
                "in": "$$p.Name"
              }},
              0
            ]
          },
          "Text": "$$c.Text"
        }
      }
    }
  }}
])

Where basically you $filter the results down of the other array in comparison and simply return the first matching element by the 0 index with $arrayElemAt.

In either case, there is no need to "self-join" using $lookup, and that's just really unnecessary overhead best avoided.

From the document in the question you get the following:

/* 1 */
{
    "_id" : ObjectId("5976fd2eb0adec0a32fa9831"),
    "People" : [ 
        {
            "_id" : 1.0,
            "Name" : "jane"
        }, 
        {
            "_id" : 2.0,
            "Name" : "Mark"
        }
    ],
    "Contents" : [ 
        {
            "userID" : 2.0,
            "Name" : "Mark",
            "Text" : "111"
        }, 
        {
            "userID" : 1.0,
            "Name" : "jane",
            "Text" : "Hi"
        }
    ]
}

Generally speaking though, there is no such reason for any aggregation operators at all, as this sort of operation is generally best left to post-processing in the cursor. In fact since you are actually "adding" data to the document to return, it's better to do modification after the document is sent over the network.

As a common idiom of the above shown as JavaScript for the shell:

db.collection.find().map( d => 
  Object.assign(
    d,
    {
      "Contents": d.Contents.map( c => 
        Object.assign(c, 
          { "Name": d.People.map(p => p.Name)[d.People.map(p => p._id).indexOf(c.userID)] }
        )
      )
    }
  )
)

Produces the exact same result, and is generally a bit easier on the eyes to read and interpret

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317