Currently I am working on a mobile app. Basically people can post their photos and the followers can like the photos like Instagram. I use mongodb as the database. Like instagram, there might be a lot of likes for a single photos. So using a document for a single "like" with index seems not reasonable because it will waste a lot of memory. However, I'd like a user add a like quickly. So my question is how to model the "like"? Basically the data model is much similar to instagram but using Mongodb.
-
You could have a `likes` field for each document tat points to a photo. and use the `$inc` operator to update the field per document in a atomic way. But if you could post your current document structure and re phrase your requirement properly, you would get good answers. – BatScream Jan 18 '15 at 02:48
-
2Adding likes will be extremely easy and fast, you can pass all the needed data straight to the server and literally do one query to insert straight to the DB. However, you will wan to cache and aggregate the like count since counting those likes will be nasty. Most sites, including instagram, use a counter, like @BatScream says using $inc (or whatever exists in the tech they are using) to cache the likes existence making it easy to say how many likes something has – Sammaye Jan 18 '15 at 02:57
1 Answers
No matter how you structure your overall document there are basically two things you need. That is basically a property for a "count" and a "list" of those who have already posted their "like" in order to ensure there are no duplicates submitted. Here's a basic structure:
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3")
"photo": "imagename.png",
"likeCount": 0
"likes": []
}
Whatever the case, there is a unique "_id" for your "photo post" and whatever information you want, but then the other fields as mentioned. The "likes" property here is an array, and that is going to hold the unique "_id" values from the "user" objects in your system. So every "user" has their own unique identifier somewhere, either in local storage or OpenId or something, but a unique identifier. I'll stick with ObjectId
for the example.
When someone submits a "like" to a post, you want to issue the following update statement:
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": { "$ne": ObjectId("54bb2244a3a0f26f885be2a4") }
},
{
"$inc": { "likeCount": 1 },
"$push": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
Now the $inc
operation there will increase the value of "likeCount" by the number specified, so increase by 1. The $push
operation adds the unique identifier for the user to the array in the document for future reference.
The main important thing here is to keep a record of those users who voted and what is happening in the "query" part of the statement. Apart from selecting the document to update by it's own unique "_id", the other important thing is to check that "likes" array to make sure the current voting user is not in there already.
The same is true for the reverse case or "removing" the "like":
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": ObjectId("54bb2244a3a0f26f885be2a4")
},
{
"$inc": { "likeCount": -1 },
"$pull": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
The main important thing here is the query conditions being used to make sure that no document is touched if all conditions are not met. So the count does not increase if the user had already voted or decrease if their vote was not actually present anymore at the time of the update.
Of course it is not practical to read an array with a couple of hundred entries in a document back in any other part of your application. But MongoDB has a very standard way to handle that as well:
db.photos.find(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
},
{
"photo": 1
"likeCount": 1,
"likes": {
"$elemMatch": { "$eq": ObjectId("54bb2244a3a0f26f885be2a4") }
}
}
)
This usage of $elemMatch
in projection will only return the current user if they are present or just a blank array where they are not. This allows the rest of your application logic to be aware if the current user has already placed a vote or not.
That is the basic technique and may work for you as is, but you should be aware that embedded arrays should not be infinitely extended, and there is also a hard 16MB limit on BSON documents. So the concept is sound, but just cannot be used on it's own if you are expecting 1000's of "like votes" on your content. There is a concept known as "bucketing" which is discussed in some detail in this example for Hybrid Schema design that allows one solution to storing a high volume of "likes". You can look at that to use along with the basic concepts here as a way to do this at volume.

- 8,444
- 2
- 30
- 32

- 148,042
- 36
- 346
- 317
-
2Nice answer, sorry for bring this up but what is your idea about implementing a solution like this (instead of using sub document for keeping likes or votes) http://stackoverflow.com/questions/26914380/schema-for-user-ratings-key-value-db – Disposer Jan 18 '15 at 06:19
-
@Disposer the general idea here is to make it as simple as possible to check if someone has voted or not and read back a total vote count with out aggregation or at least by splitting into as little documents as possible. Other models either rely on aggregation in real time or otherwise are not atomic in updates. Fast to write, and fast to read. For high activity items that is usually what you want. – Neil Lunn Jan 18 '15 at 06:28
-
@Neil Lunn. Thanks for your answer. Actually my data structure is very similar to yours and I plan to use a bucketing design. I was wondering how good is the performance of looking up a lot of buckets of likes using the $elemMatch operator. Say there are 30,0000 likes of a photo, I have 300 buckets and each bucket contains 1000 likes. Is it efficient to know whether the current user has already placed a vote or not? And I am also interested in "Other models either rely on aggregation in real time" you mentioned, can u explain more so that I can evaluate more options? – user2914635 Jan 18 '15 at 07:43
-
2@user2914635 What you are now asking about is really another question, but even as such it's a pretty broad one at that. If you want to tour different techniques and don't mind reading through some code, then you can take a look though [hvdf](https://github.com/10gen-labs/hvdf) and the [socialite](https://github.com/10gen-labs/hvdf) sources. There is even a few talks from Daren like [this one](http://www.mongodb.com/presentations/socialite-open-source-status-feed-part-3-scaling-data-feed). Also consider that getiing 300,000 likes is going to be the exception rather than the rule. – Neil Lunn Jan 18 '15 at 07:57
-
2@Neil Lunn. Basically the pro of bucketing design is that it saves memory and easy to retrieves likes to display to end users, the cons of bucketing design is that it makes inserting like more expensive. The pro of treating each like as a single document is that it makes inserting like efficiently and retrieving likes also have reasonable performance , but it will waste lots of memory. Am I correct?As I know, Instagram usually have a lot of likes of each posts. Suppose I also have a lot likes for one post. Which data model should I take?Need your suggestions, thanks in advance – user2914635 Jan 18 '15 at 08:06
-
@user2914635 As I said this is really another question and it really depends on how you interpret how you are recording "buckets". The links I gave you are two service designs that have concepts of "high volume" and "low volume" users and content and deals with them differently so each is optimal. But if you have another question then post another question rather than ask more within comments. Please also do not forget to [accept your answers](http://meta.stackexchange.com/a/5235/252977) – Neil Lunn Jan 18 '15 at 08:11
-
@Neil Lunn. Hi, thanks for your quick reply. I think "likes" of a photo is still "likes" of a photo. It's related to a photo instead of a user's timeline feed. So I don't know why you insists this is really another question. Actually I have another idea is that I create a "user_like" collection to track a user's likes. Instead of looking up a user from the "photo_like" collection, we looks up whether a user possess the photo id in his likes. The number of a user's likes for all photos is relatively small compared with that of a popular photo, so making inserting a like less expensive. – user2914635 Jan 18 '15 at 11:54
-
I find it as good approach but storing objectId's of users who liked the post in a field `"likes": []` might exceed the max allowed document size ie. 16MB. This might happen when billion/millions liked users objectId's has to be stored in this field. So i believe its better to have separate collection called `like_logs` with postId and userId as fields. By this we can avoid duplicate likes by querying for `userId` in `like_logs` collection. By doing so i beleive query performance will also be good. – Mohammed Farhan Mar 11 '21 at 04:41