7

I have a basic question about where I should embed a collection of followers/following in a mongo db. It makes sense to have an embedded collection of following in a user object, but does it also make sense to also embed the converse followers collection as well? That would mean I would have to update and embed in the profile record of both the:

  1. following embedded list in the follower
  2. And the followers embedded list of the followee

I can't ensure atomicity on that unless I also somehow keep a transaction or update status somewhere. Is it worth it embedding in both entities or should I just update #1, embed following in the follower's profile and, put an index on it so that I can query for the converse- followers across all profiles? Is the performance hit on that too much?

Is this a candidate for a collection that should not be embedded? Should I just have a collection of edges where I store following in its own collection with followerid and followedbyId ?

Now if I also have to update a feed to both users when they are followed or following, how should I organize that?

As for the use case, the user will see the people they are following when viewing their feeds, which happens quite often, and also see the followers of a profile when they view the profile detail of anyone, which also happens often but not quite as much as the 1st case. In both cases, the total numbers of following and followers shows up on every profile page.

Community
  • 1
  • 1
MonkeyBonkey
  • 46,433
  • 78
  • 254
  • 460

2 Answers2

13

In general, it's a bad idea to embed following/followed-by relationships into user documents, for several reasons:

(1) there is a maximum document size limit of 16MB, and it's plausible that a popular user of a well-subscribed site might end up with hundreds of thousands of followers, which will approach the maximum document size,

(2) followership relationships change frequently, and so the case where a user gains a lot of followers translates into repeated document growth if you're embedding followers. Frequent document growth will significantly hinder MongoDB performance, and so should be avoided (occasional document growth, especially is documents tend to reach a stable final size, is less of a performance penalty).

So, yes, it is best to split out following/followed-by relationship into a separate collection of records each having two fields, e.g., { _id : , oid : }, with indexes on _id (for the "who am I following?" query) and oid (for the "who's following me?" query). Any individual state change is modeled by a single document addition or removal, though if you're also displaying things like follower counts, you should probably keep separate counters that you update after any edge insertion/deletion.

(Of course, this supposes your business requirements allow you some flexibility on the consistency details: in general, if your display code tells a user he's got 304 followers and then proceeds to enumerate them, only the most fussy user will check that the followers enumerated tally up to 304. If business requirements necessitate absolute consistency, you'll either need a database that isolates transactions for you, or else you'll have to do the counting yourself as part of displaying all user identities.)

mpobrien
  • 4,922
  • 1
  • 33
  • 35
  • 1
    Despite being very relational in nature, I totally agree with your interpretation. This is one of those places where relations make perfect sense, and you don't end up with performance penalties this way. – Christopher WJ Rueber Nov 21 '11 at 20:15
  • would this also apply to "likes" as well? The voting example on the mongo website embeds the likes in the document but it seems the same line of reasoning can be made for likes as well as following. – MonkeyBonkey Nov 22 '11 at 00:56
  • @MonkeyBonkey - this approach could conceivably be used for "likes" as well, but you most likely would only want 1 of the indexes. The advantage of embedding it for a "likes" scenario is that you can maintain an accurate count of "likers" using the $inc operator. Also, this depends on the site of course but the # of people who like a single post is unlikely to ever reach the same level as # of followers for a high-traffic user, so the worst case performance is probably less critical there. – mpobrien Nov 29 '11 at 20:31
  • 2
    The problem I'm having with creating a separate document for each "follow" is efficiently querying against them. If I want to get the most 10 most recent posts from everyone I'm following, then I have a very inefficient query here... – Chet Apr 29 '15 at 01:35
0

You can embed them all but create a new document when you reach a certain limit. For example you can limit a document to an array of 500 elements then create a new one. Also, if it is about feed, when viewed, you dont have to keep the viewed publications you can replace by new ones so you don't have to create new document for additional publication storage.

To maintain your performance, I'd advice you to make a collection that can use graphlookup aggregation, where you store your following. Being followed can reach millions of followers, so you have to store what pwople follow instead of who follows them. I hope it helps you.

Raman
  • 1
  • 1