9

I've been wondering about the ideal document structure for maximum query efficiency for various situations and there's one I want to ask about. It's really borne out of me not really knowing how MongoDB behaves in memory in this specific kind of case. Let me give you a hypothetical scenario.

Imagine a Twitter-style system of Followers and Followees. After an admittedly cursory glance, the main options appear to be:

  1. In each user document, a "followers" array containing references to all the documents of other users they follow. Followees are found by finding our current user in other users' "user.followers" array. The main downside would appear to be the potential query overhead of the Followee search. Also, for a query specifically for the contents of "user.followers", does MongoDB just access the required field in users' documents, or is the whole user document found and then the required field values looked up from there and is this cached/stored in such a way that a query over a large user base would require significantly more memory?

  2. In each user document, storing both "followers" and "followees" for quicker access to each. This obviously has the downside of duplicate data in the sense that an entry for user A following user B exists in both user documents in the respective field, and deletion from from requires a matching deletion in the other. Technically, this could be considering doubling number of points of potential failure for a simple deletion. And does MongoDB still suffer from what I've heard described as "swiss cheesing" of it's memory-stored data when deletions occur, and so removals from the 2 fields rather than 1 doubles the effect of that memory hole problem?

  3. A separate collection for storing users' Followers, queried in a similar fashion to the user documents in 1- except that obviously the only data being accessed is Followers so if the user documents contain quite a lot of other data relevant to each user, we avoid accessing that data. This seems to have something of a relational database feel to it though and while I know that's not always a terrible approach just on principle, obviously if one of the other approaches mentioned (or one I haven't considered) is better under Mongo's architecture I'd love to learn!

If anyone has any thoughts on this, or wants to tell me I've missed a very relevant and and obvious docs page somewhere, or even wants to tell me that I'm just being stupid (thought with an explanation of why, please ;) ) I'd love to hear from you!

tdous
  • 1,599
  • 2
  • 14
  • 19
  • What programming language will you be using? Depending on that there are certain features which the underlying driver may or may not support. I am in particular talking about DBRefs. http://docs.mongodb.org/manual/applications/database-references/ – Amith George Jul 16 '12 at 20:15
  • That's a good point, thanks. We could end up using anything but currently a mix of PHP and Node.js. – tdous Sep 10 '12 at 19:16

2 Answers2

8

This is a classic follower-followee problem and there's no one answer to it..Check out this link:

mongo db design of following and feeds, where should I embed?

Actually this situation lends itself very well to a relational schema, if MongoDB and SQL server were the only choices you had. But this is a special type of relational problem wherein you have a two-way relationship. This can perhaps be better handled by a graph database:

http://forum.kohanaframework.org/discussion/10130/followers-and-following-database-design-like-twitter/p1

The thing is, you could either keep followers or followees in a User document, but not both, for avoiding double deletion issues. So if you must stick to MongoDB, one way out could be..(assuming people don't follow/unfollow anyone that frequently),

Keep just the followees in the document, because when I view my profile, I'd be interested in the people I follow.. (that's the reason I followed them in the first place, right?)..And then do a query like:

db.Users.find({ user_id : { $in : followees })

This will tell who all are following me (say my id is 'user_id').

Another reason why I don't suggest the other way round is that.. one may follow at the most 30-40 people, so User document storing 30-40 followees should be okay as against a User document storing thousands of followers! With the followee-in-document approach, you get an roughly even sized User documents throughout..In the follower-in-document approach, you will have some very small but some very bulky documents as well. And depending upon the amount of follower-data you put in (if any, apart from follower_id), you might want to be careful about the document size limit.

Community
  • 1
  • 1
Aafreen Sheikh
  • 4,949
  • 6
  • 33
  • 43
  • 1
    Nice! You covered all the points I had to say! Option 2 is definitely a no no. Storing the id's of users you are following is the way to go. Getting a list of users who are following you is just one query and can be indexed. Refer: http://www.mongodb.org/display/DOCS/Schema+Design – Amith George Jul 16 '12 at 20:36
  • This is the way I would consider too, however, I'm a bit concerned with the 'unbounded field' performance problem in mongo, which might make this a poor choice. See: http://stackoverflow.com/questions/9306815/mongodb-performance-with-growing-data-structure What are your thoughts on this? – UpTheCreek Nov 08 '12 at 19:50
  • @UpTheCreek The unbounded field here is the the list of ppl I follow. Assuming this doesn't grow beyond 30-40 users, it would be less of an issue compared to having an unbounded field containing thousands of followers. Again, this argument is very specific to this use case (follower-followee in twitter style). – Aafreen Sheikh Jan 02 '13 at 19:25
  • I have implemented this same exact strategy for a couple of apps. The only major drawback that you get from this relates to letting the client know if they like/follow someone. When you are serving a profile(s) to a request,the flag on whether the requesting user is following/liking another one will add to the complexity of your response logic. For each profile or page of profiles, you will need to search the followers collection on any match. This makes everything twice as slow :( regardless of the strategy you follow. Anyone has any ideas on how to improve this? – fino Feb 12 '15 at 19:32
2

Given that its a many to many relationship, option (2) look good to me. As for the matching deletions, its usually not an issue, as long as you have some sort of reconciliation mechanism between the two documents.

Fragmentation generally depends on the application's access patterns and is generally an issue with most data systems. Some significant changes have been made to mongo to avoid internal fragmentation. Further, there are offline compaction alternatives to fix fragmentation, if it happens.

Sid
  • 954
  • 6
  • 7