Given the below competing schemas with up to 100,000 friends I’m interested in finding the most efficient for my needs.
Doc1 (Index on user_id)
{
"_id" : "…",
"user_id" : "1",
friends : {
"2" : {
"id" : "2",
"mutuals" : 3
}
"3" : {
"id" : "3",
"mutuals": "1"
}
"4" : {
"id" : "4",
"mutuals": "5"
}
}
}
Doc2 (Compound multi key index on user_id & friends.id)
{
"_id" : "…",
"user_id" : "1",
friends : [
{
"id" : "2",
"mutuals" : 3
},
{
"id" : "3",
"mutuals": "1"
},
{
"id" : "4",
"mutuals": "5"
}
]}
I can’t seem to find any information on the efficiency of the sub field retrieval. I know that mongo implements data internally as BSON, so I’m wondering whether that means a projection lookup is a binary O(log n)?
Specifically, given a user_id to find whether a friend with friend_id exists, how would the two different queries on each schema compare? (Assuming the above indexes) Note that it doesn’t really matter what’s returned, only that not null is returned if the friend exists.
Doc1col.find({user_id : "…"}, {"friends.friend_id"})
Doc2col.find({user_id : "…", "friends.id" : "friend_id"}, {"_id":1})
Also of interest is how the $set modifier works. For schema 1,given the query Doc1col.update({user_id : "…"}, {"$set" : {"friends.friend_id.mutuals" : 5})
, how does the lookup on the friends.friend_id work? Is this a O(log n) operation (where n is the number of friends)?
For schema 2, how would the query Doc2col.update({user_id : "…", "friends.id" : "friend_id"}, {"$set": {"friends.$.mutuals" : 5})
compare to that of the above?