I've been wondering about the ideal document structure for maximum query efficiency for various situations and there's one I want to ask about. It's really borne out of me not really knowing how MongoDB behaves in memory in this specific kind of case. Let me give you a hypothetical scenario.
Imagine a Twitter-style system of Followers and Followees. After an admittedly cursory glance, the main options appear to be:
In each user document, a "followers" array containing references to all the documents of other users they follow. Followees are found by finding our current user in other users' "user.followers" array. The main downside would appear to be the potential query overhead of the Followee search. Also, for a query specifically for the contents of "user.followers", does MongoDB just access the required field in users' documents, or is the whole user document found and then the required field values looked up from there and is this cached/stored in such a way that a query over a large user base would require significantly more memory?
In each user document, storing both "followers" and "followees" for quicker access to each. This obviously has the downside of duplicate data in the sense that an entry for user A following user B exists in both user documents in the respective field, and deletion from from requires a matching deletion in the other. Technically, this could be considering doubling number of points of potential failure for a simple deletion. And does MongoDB still suffer from what I've heard described as "swiss cheesing" of it's memory-stored data when deletions occur, and so removals from the 2 fields rather than 1 doubles the effect of that memory hole problem?
A separate collection for storing users' Followers, queried in a similar fashion to the user documents in 1- except that obviously the only data being accessed is Followers so if the user documents contain quite a lot of other data relevant to each user, we avoid accessing that data. This seems to have something of a relational database feel to it though and while I know that's not always a terrible approach just on principle, obviously if one of the other approaches mentioned (or one I haven't considered) is better under Mongo's architecture I'd love to learn!
If anyone has any thoughts on this, or wants to tell me I've missed a very relevant and and obvious docs page somewhere, or even wants to tell me that I'm just being stupid (thought with an explanation of why, please ;) ) I'd love to hear from you!