How to avoid repetitive entries in MongoDB?

Question

My question is a little bit deeper than the title.

In the database, there will be millions (maybe billions in the future) of objects. Users will be related to these objects. Users will be objects' owners. Objects will be owned by multiple users (thousands) and users will own millions of objects.

So I don't want to create a document for every single relationship because many users will own the same objects.

I thought about storing user ids in an array in each object document but I'm not sure if there will be performance penalty. Also, MongoDB has 16MB limit for each document so that's another negativity. Because each ObjectID is 12 bytes and with 1 million users, it consumes 12MB of the document. There has to be a better structure.

How can I minimize this relationship recording?

I believe users are independent enough to dedicate them each an entry in the collection. If you think many of them will be repeated, try to extract something in common and create a UserGroup collection that connects users to their group and to their objects. — Felipe Sulser, Apr 28 '16 at 12:41
@FelipeSulser well, the common thing is the object's itself. I can group them as the certain object owner but that wouldn't make sense since it's just storing user array in a different place. — stackyname, Apr 28 '16 at 12:45
Possible duplicate of [MongoDB relationships: embed or reference?](http://stackoverflow.com/questions/5373198/mongodb-relationships-embed-or-reference) — joao, Apr 28 '16 at 13:34
@joao If that's best answer, then I will always be at the risk of hitting the document size limit. I just hope there is a better way to do this. — stackyname, Apr 28 '16 at 13:44
@stackyname what do you mean by best answer? I believe it is quite flexible, so you have to consider your own case. Basically it will walk you through your options at the moment, there are no magic pills here :) — joao, Apr 28 '16 at 13:54
@joao Haha I know but my case is similar to that but with a lot more data. For example, Tinder uses MongoDB and billions of swipes are happening every day. How does Tinder store that information? If it's just logging every single swipe as a document, then it's very expensive. — stackyname, Apr 28 '16 at 14:00

score 0 · Answer 1 · answered Apr 28 '16 at 13:45

In my opinion, you may have to break down your requirement. I believe data should be saved in a way that can be accessed easily. For that will have to look at your requirements - how you plan to display or use data?

Anyway, I am sure that if I own 1000 objects it just does not make sense to see all at once. I would like to see in pages may be 10 per page or week by week or day by day.

Considering the above Let's look at this scenario.

I own many documents.
Last week I owned 2, yesterday 5, today 10 and list goes own.

Suppose I own following docs.

Object1, Object2, Object5, Object6 ...

I will create an intermediate collection where I will store relationships, It will have one record per Object per day (or per hour - If need more granular search).

{
    "_id": "someId",
    "object": "Object1",
    "year": 2015,
    "month": 12,
    "day": 1,
    "hour": 12,
    "owned_by": [
      "stackyname",
      "titogeo"
    ]
  },
  {
    "_id": "someId",
    "object": "Object2",
    "year": 2015,
    "month": 12,
    "day": 1,
    "hour": 12,
    "owned_by": [
      "antman",
      "titogeo"
    ]
  },
  {
    "_id": "someId",
    "object": "Object2",
    "year": 2015,
    "month": 12,
    "day": 2,
    "hour": 13,
    "owned_by": [
      "batman",
      "heman"
    ]
  }

That means I have a relationship document per object per hour. When I own a document I push (upsert) my user id to the current relationship object. Current relationship object is

find({
  "object": "object6",
  "year": "currentYear",
  "month": "currentMonth",
  "hour" : "currentHour"
});

If I want all the users who own an object I can query relationship collection find({"object": "object6"}) (Of course with pagination).

If I want all documents that I own I can query find({'owned_by' : 'titogeo'})

I am not an expert in schema design nor I know the various technics. These are some thoughts I have and let me know yours.

Thanks for your answer. I was thinking to plan the schema like your example. My question is the limitations of that schema. Of course it won't be fetched all at once. But that's the application side. My question: Is collecting users in an array in the document good approach? If user count exceeds 1 million, would that be problem? — stackyname, Apr 28 '16 at 13:56
Relationship document for an object is for an hour. if you think in an hour one million users are going to like/own a document, you can make it even granular right? add a minute field as well. — titogeo, Apr 28 '16 at 14:26
I don't quite understand. How are you going to store millions of users' id in an array? 1 million user (12 bytes for each user's id) already takes 12 MB of 16MB document. — stackyname, Apr 28 '16 at 14:30
Just to make sure that we are on the same page - For one Object in relationship collection, there are multiple records. Each record holds the information about the users who owned that object in the particular hour. My guess in one hour number or users owned the object is not going to exceed one million. — titogeo, Apr 28 '16 at 14:39
I have nothing to do with time. Once an object is owned by a user, it's owned forever. So object's user array will only get bigger. Every object is basically a user data but since all users have certain objects from the object pool, I just want to reference users so there will be no object duplicates. — stackyname, Apr 28 '16 at 14:48

How to avoid repetitive entries in MongoDB?

1 Answers1