3

I want to set the value of a new field in every document of a "destination" collection, based on the value of a field in documents of another "source" collection.

Documents of the destination collection look like this:

db.names:

{ 
    "name" : "Larry", 
    "info" : {
        "phone" : "5551212"
    }
}
{ 
    "name" : "Curly", 
    "info" : {
        "phone" : "5551213"
    }
}
{ 
    "name" : "Moe", 
    "info" : {
        "phone" : "5551234"
    }
}

Documents of the second collection look like this (I want this timezone to be added to the destination collection):

db.phones:

{ 
    "phone" : "5551212", 
    "timezone" : "UTC-6" 
}

{ 
    "phone" : "5551213", 
    "timezone" : "UTC-7" 
}

{ 
    "phone" : "5551234", 
    "timezone" : "UTC-6" 
}

I want the documents of the first, "destination", collection to end up looking like this:

db.names:

{ 
    "name" : "Larry", 
    "info" : {
        "phone" : "5551212",
        "timezone" : "UTC-6"
    }
}
{ 
    "name" : "Curly", 
    "info" : {
        "phone" : "5551213",
        "timezone" : "UTC-7"
    }
}
{ 
    "name" : "Moe", 
    "info" : {
        "phone" : "5551234",
        "timezone" : "UTC-6"
    }
}

In other words, I have a very large collection (phones) that includes timezones and a very large collection(names) that does not, and I want the first collection to include those timezones, using the phone number in both as the key.

I have tried this in mongoShell with no luck:

list = db.names.aggregate([
    { $match: { } },

    { $lookup: {  
                from: "phones",
                localField: "info.phone",
                foreignField: "phone",
                as: "zoneinfo"
            }
    }
]);

list.result.forEach(function(x) {
    db.names.update({_id:x._id}, {$set:{'info.timezone':'zoneinfo.timezone'}});
});

So, link the collection of timezones and add it to list, the result, as a new field on each document (this much works). Then, since we can't do an update in an aggregate, iterate over the resulting documents, adding a new permanent field, info.timezone, from the 'temporary' one, zoneinfo.timezone, added in the previous operation.

What am I doing wrong here? Is there some other approach which would be preferable? There are thousands of documents in each collection, so hand work is not desirable.

John Marquez
  • 615
  • 6
  • 23

1 Answers1

2

One thing you need to keep in mind is that $lookup returns zoneInfo as an array so in order to use that field you need to run $unwind on it. Then you can simply reshape your document using $addFields and $project. Try:

db.names.aggregate([
    {
        $lookup: {
            from: "phones",
            localField: "info.phone",
            foreignField: "phone",
            as: "phoneDetails"
        }
    },
    {
        $unwind: "$phoneDetails"
    },
    {
        $addFields: {
            "info.timezone": "$phoneDetails.timezone"
        }
    },
    {
        $project: {
            phoneDetails: 0
        }
    }
])

And in the last step you can add $out if you want to update existing collection.

mickl
  • 48,568
  • 9
  • 60
  • 89
  • 2
    Since the OP included a match pipeline (albeit an empty one), it may be worth a warning that using $out replaces the contents of an existing collection with the result of the aggregate. So if the match pipeline is ever modified to perform the aggregate on a subset, then those documents not included in the match will be removed from the existing collection - they are not simply left "unmodified". – Jerren Saunders Mar 14 '19 at 20:26
  • @JerrenSaunders I think he's trying to modify entire collection since this match is empty but it's good that you mentioned that: $out always replaces existing collection, thank you – mickl Mar 14 '19 at 20:29
  • @JerrenSaunders Good catch. Yes, there are some documents in the source and destination collections that do not have phone numbers in them, which means the resulting documents of the operation above are a subset of the original collection. I can't simply delete the existing records in the destination collection using $out. This is the reason I started with the two step update() approach, and why I had a blank $match query: I removed the actual criteria to simplify my question. My $match is : { 'info.phone': {$exists:1}}, to only copy timezone info for contacts who have provided a phone number. – John Marquez Mar 14 '19 at 20:41
  • @JohnMarquez have you considered $unwind with `preserveNullAndEmptyArrays: true` option ? This will keep those document withone `info.phone` – mickl Mar 14 '19 at 20:58