38

I am breaking my mind up thinking about a good document structure for handling a message app.

I basically need three (or four) types of objects:

  1. The user (username, email, password, etc.)
  2. The contacts list (containing different contacts or contacts groups)
  3. The conversation (a conversation is a collection of messages between some persons)
  4. The message (contains the message body, some timestamp and the creator.)

My idea was to embed the contacts into the user document and to embed the messages in a conversation document:

1. User

{
    username: 'dev.puS',
    usernameCanonical: 'dev.pus', // used for unique constraints
    email: 'developement.pus@gmail.com,
    emailCanonical: 'developement.pus@gmail.com,
    salt: 'some hash',
    password: 'hash with salt',
    logs: { last_login: 12.06.2008, last_password_reset: 04.03.2007 },
    state: { online: true, available: false },
    contacts: [ user_id1, user_id2, user_id3 ]
}

2. Conversation

{
    members: [ user_id1, user_id2 ],
    messages: [
        { author: user_2, body: 'Hi what's up' },
        { author: user_1, body: 'Nothing out here :(' },
        { author: user_2, body: 'Whanna ask some question on stackoverflow' },
        { author: user_1, body: 'Okay, lets go' }
    ]
}

What do you think about this schema?

I think it would be better to keep them seperated (so each document for it's own) because each document has different update frequency. But I really don't have any experience about it so it would be good to hear some advices :)

Regards

dev.pus
  • 7,919
  • 13
  • 37
  • 51
  • 2
    A MongoDB schema is never “good” or “bad” by itself. You need to detail the queries and updates you’re going to make. Only then can you evaluate if a given schema suits these operation patterns. – Vasiliy Faronov Jun 27 '12 at 14:32
  • 1
    You also need to estimate the distribution of data sizes, e.g.: how many messages will a conversation contain, on average, at a maximum? This may be important if you want to embed. – Vasiliy Faronov Jun 27 '12 at 14:35
  • Okay, I will keep this in mind. Is it a common approach to cache for example the messages with redis and than save them all to mongo when the session ends? I am a bit unsure about performing a lot of write actions to an "unstructured" object – dev.pus Jun 27 '12 at 14:43

4 Answers4

23

I see that this question is old, but for anyone interested, a similar question was asked and one answer looks viable https://stackoverflow.com/a/30830429/132610

Conversation : {
 id: 123,
 members: [ user_id1, user_id2 ]
}
Message { conversationId: 123, author: user_2, body: 'Hi what's up' }
Message { conversationId: 123, author: user_1, body: 'Whanna ask some question on stackoverflow' }

Update #1

1) Scalability: MongoDB scales well with very large collection. Billions of messages per collection. There is a technique called sharding that can allow you to split larger collection to multiple nodes.

2) Reading. Since MongoDB has indexing mechanisms, reads are comparable to any fine-tuned database engine. So reading will not be an issue. Especially, when a conversation(group|room) has fewer participants, for example two people messaging each other.

P.M
  • 2,880
  • 3
  • 43
  • 53
  • 3
    I have a confusion in my mind, you (and everyone else) said one collection for `Conversations` and another Collection for `Messages`. let say we have 1 million users in the messaging and they are talking to each other, `Messages` table may reach billions of billion documents, does mongodb have capability to manage such big collection of documents, and what about search response time? let say we search for last 100 messages a single user in billions of billion messages how much time it will take to come back? – Inzamam Malik Jan 18 '18 at 11:27
  • @InzamamMalik, I have the same exact question!! Were you able to find the "best practice" answer? – Moe kanan Sep 01 '18 at 18:43
  • Here is a confusion for me. In private conversation time, We need to find "Is there any private conversation there which members are this two people?" if yes, we should load history of that conversation first ( e.g. 10 latest messages ) then send message to that conversation. So every time wee need to search all private conversations which user subscribed, first. But if a side was deleted that conversation (unsubscribed) then we can't subscribe him/her to it again. Is there a comprehensive analysis on this topic? – Vahid Alimohamadi Oct 15 '19 at 10:51
8

Your question is really one of schema design. I suggest taking a look at this page on MongoDB schema design to get a sense of the choices and trade-offs: http://www.mongodb.org/display/DOCS/Schema+Design

In addition, you should probably review the links in the 'See Also' section of that document. I especially recommend the video presentations.

Finally, you should probably take a look at this document for a discussion of the three possible schemas for a messaging/commenting database, including the trade-offs for each design: http://docs.mongodb.org/manual/use-cases/storing-comments/

William Z
  • 10,989
  • 4
  • 31
  • 25
1

Please find my suggestion:

    Person : {
        person_id: '123',
        last_login: 12.06.2008,
        online: true
    }

Conversation : {
 conversation_id: append the greater person_id to the lower person_id, // person_1_id =123 and person_2_id =124 then its 123124

messages: [ 
        { message_id: 1, 
          message_text : 'Hi what's up',
          sender_id : 123,
          receiver_id: 124,
          timestamp : 12344567891
        },
        { message_id: 2, 
          message_text : 'fine',
          sender_id : 124,
          receiver_id: 123,
          timestamp : 12344567891
        }
       ]
}
abRam
  • 67
  • 9
-3

this is my suggestion

{
"_id" : ObjectId("5a9e9581a2147c0c0f00002e"),
"id_members1" : "5a9e9581a2147c0c0f02345t",
"id_members2" : "5a9e9581a2147c0c0f02134g",
"name" : [ 
    "Omar", 
    "Mohamed"
],
"messages" : [ 
    {
        "author" : "Omar",
        "body" : "salam 3likom",
        "create_at" : ISODate("2018-03-07T09:04:04.000Z")
    }, 
    {
        "author" : "Mohamed",
        "body" : "Wa3likom salam",
        "create_at" : ISODate("2018-03-07T09:04:04.000Z")
    }, 
    {
        "author" : "Mohamed",
        "body" : "wach teshak",
        "create_at" : ISODate("2018-03-07T09:04:04.000Z")
    }, 
    {
        "author" : [ 
            "Omar", 
            "Mohamed"
        ],
        "body" : "test msg",
        "create_at" : ISODate("2018-03-25T15:30:05.000Z")
    }
],
"comments" : [ 
    null, 
    {
        "author" : [ 
            "Omar", 
            "Mohamed"
        ],
        "body" : "test msg",
        "create_at" : ISODate("2018-03-25T15:28:11.000Z")
    }, 
    {
        "author" : [ 
            "Omar", 
            "Mohamed"
        ],
        "body" : "test msg",
        "create_at" : ISODate("2018-03-25T15:28:31.000Z")
    }
]

}

Amirouche Zeggagh
  • 3,428
  • 1
  • 25
  • 22
  • 1
    Everytime you query this object you would get all the messages? what about a conversation with 1Mi messages? Wouldn't be better to have a separated collection for messages? Or maybe create a cronjob to move older messages to a new collection with older messages to paginate – Giovanne Afonso Oct 26 '18 at 18:02
  • Why you added comment object? Also, are there some recommendations if we want to do big group chats too? – Tejinder Feb 12 '19 at 11:37
  • 2
    There is a limit on size of an array in mongoDb and I guess it was 4MB if I'm correct. So this is really a bad design. Also creates too much complexity querying whats inside the array. – Sepehr GH May 15 '19 at 06:50