MongoDB embedded vs array sub document performance

Question

Given the below competing schemas with up to 100,000 friends I’m interested in finding the most efficient for my needs.

Doc1 (Index on user_id)

{
"_id" : "…",
"user_id" : "1",
friends : {
    "2" : {
        "id" : "2",
        "mutuals" : 3
    }
     "3" : {
         "id" : "3",
         "mutuals": "1"
    }
   "4" : {
         "id" : "4",
         "mutuals": "5"
    }
}
}

Doc2 (Compound multi key index on user_id & friends.id)

{
"_id" : "…",
"user_id" : "1",
friends : [
   {
        "id" : "2",
        "mutuals" : 3
    },
    {
         "id" : "3",
         "mutuals": "1"
    },
   {
         "id" : "4",
         "mutuals": "5"
    }
]}

I can’t seem to find any information on the efficiency of the sub field retrieval. I know that mongo implements data internally as BSON, so I’m wondering whether that means a projection lookup is a binary O(log n)?

Specifically, given a user_id to find whether a friend with friend_id exists, how would the two different queries on each schema compare? (Assuming the above indexes) Note that it doesn’t really matter what’s returned, only that not null is returned if the friend exists.

Doc1col.find({user_id : "…"}, {"friends.friend_id"})
Doc2col.find({user_id : "…", "friends.id" : "friend_id"}, {"_id":1})

Also of interest is how the $set modifier works. For schema 1,given the query Doc1col.update({user_id : "…"}, {"$set" : {"friends.friend_id.mutuals" : 5}), how does the lookup on the friends.friend_id work? Is this a O(log n) operation (where n is the number of friends)?

For schema 2, how would the query Doc2col.update({user_id : "…", "friends.id" : "friend_id"}, {"$set": {"friends.$.mutuals" : 5}) compare to that of the above?

Go with the array style (Doc2) as dynamic keys are almost never the right approach. Also, don't use smart quotes (it's not legal syntax and it's hard to read). — JohnnyHK, Nov 30 '12 at 04:28
I suppose Doc2 will use up like a couple of bytes of extra storage but as @JohnnyHK says Doc1 is not really a good approach, trust me the amount of questions from people who use Doc1 then realise they have to move to Doc2 to do anything with their schema... — Sammaye, Nov 30 '12 at 12:35
Thanks for the advice. @Sammaye why will Doc2 use up a couple of bytes extra storage? Are you referring to the index? Btw smart quotes were a mistake from copy paste — Nelson Shaw, Nov 30 '12 at 20:45
The index should be relatively the same size between the two, hmm I might be wrong actually I think an array of objects is smaller to store than a custom object of other custom objects, either way I think the difference in storage is almost negligible. — Sammaye, Nov 30 '12 at 20:47
This question has been asked/answered pretty well here:http://stackoverflow.com/questions/8077514/mongodb-preferred-schema-for-embedded-collections-documents-vs-arrays. Key point for me seems that the dynamic key approach is not indexable. — John Greenall, Nov 29 '13 at 15:02
Doc2 will also allow you to run an aggregate query using the $unwind to query the array elements. You can't do this with Doc1. — Adam Pridmore, Nov 05 '14 at 21:32

Gabe Rainbow · Answer 1 · 2014-12-01T03:57:39.217

doc1 is preferable if one's primary requirements is to present data to the ui in a nice manageable package. its simple to filter only the desired data using a projection {}, {friends.2 : 1}

doc2 is your strongest match since your use case does not care about the result Note that it doesn’t really matter what’s returned and indexing will speed up the fetch.

on top of that doc2 permits the much cleaner syntax

db.doc2.findOne({user_id: 1, friends.id : 2} )

versus

db.doc1.findOne({ $and : [{ user_id: 1 }, { "friends.2" : {$exists: true} }] })

on a final note, however, one can create a sparse index on doc1 (and use $exists) but your possibility of 100,000 friends -- each friend needed a sparse index -- makes that absurd. opposed to a reasonable number of entries say demographics gender [male,female], agegroups [0-10,11-16,25-30,..] or more impt things [gin, whisky, vodka, ... ]

MongoDB embedded vs array sub document performance

1 Answers1