22

A very simple design problem. Say I want to build Facebook Messenger. Let's say John and Marry are chatting, which is a better approach?

1) 1 document per conversation, messages is an array of message object

{ participants: ['john', 'marry'], 
  messages: [ 
      { sender: 'john', content: 'howdy', time_created: new Date() },
      { sender: 'marry', content: 'good u', time_created: new Date() },
      ...
  ]
}

2) 1 document per message

{ participants: ['john', 'marry'], sender: 'john', message: 'howdy', time_created: new Date() } // document 1
{ participants: ['john', 'marry'], sender: 'marry', message: 'good u', time_created: new Date() } // document 2
.... 

Which approach has better performance in terms of inserting a new message (updating a conversation vs. creating a new document) ?

or are there any better approach (as in my 2nd approach, i'm not sure if it's a good design to specify the participants field in each document)?

Thanks!

Community
  • 1
  • 1
Maria
  • 3,455
  • 7
  • 34
  • 47

1 Answers1

45

Based on your example data for the messaging app, what you could do is having two collections: Conversation and Messages. Where the relationship is one Conversation have many Messages.

Conversation:
{ id: 123
  participants: ['john', 'marry'],
}


Message:
{ sender: 'john', 
  content: 'howdy', 
  time_created: new Date(),
  converstationId: 123
},
{ sender: 'marry', 
  content: 'good u', 
  time_created: new Date(),
  converstationId: 123 
},

Creating a new document message would be better in this case, as you can then have two applications (1 for john and 1 for marry) without handling the possibility of the two of them updating the same document. They just happens to be sharing the same conversation session.

Also, if a conversation is a single document, you might end up with a very large document. (Document growth concern)

You can find out more about data modelling for this mongodb doc

http://docs.mongodb.org/manual/core/data-modeling-introduction/

Also see MongoDB: Socialite for examples/discussion for social network use case.

Hope it helps. Cheers.

Wan B.
  • 18,367
  • 4
  • 54
  • 71
  • Would updating 2 collections on every new message be a lot of database write? – Maria Jun 14 '15 at 16:25
  • You would only be updating the message collection for every new message. Conversation collection is only created/updated when there is a new participants, or new conversation started. Your application can retain the conversation id (much like a session). – Wan B. Jun 14 '15 at 21:30
  • but if you don't update conversation collection on every new message (to store last updated time), how would you fetch all the conversation sorted from most recent to least recent, when a user opens the app? – Maria Jul 04 '15 at 18:47
  • 2
    You can sort by createdAt (or message._id, as it contains a timestamp). (Probably too late to answer, but someone else might find it helpful) – Rafal Pastuszak Aug 22 '17 at 12:37
  • I have a confusion in my mind, you (and everyone else) said one collection for `Conversations` and another Collection for `Messages`. let say we have 1 million users in the messaging and they are talking to each other, `Messages` table may reach billions of billion documents, does mongodb have capability to manage such big collection of documents, and what about search response time? let say we search for last 100 messages a single user in billions of billion messages how much time it will take to come back? – Inzamam Malik Jan 18 '18 at 11:27
  • See [MongoDB Sharding](https://docs.mongodb.com/manual/sharding/) and [MongoDB at Scale](https://www.mongodb.com/mongodb-scale). Please open a new question about scaling/performance. – Wan B. Jan 18 '18 at 21:45
  • What if only there is conversation between only two people? Then is there only one id in `participants` ? – Ashh May 14 '18 at 12:14
  • @Ashish then there would only be one id in Conversation collection. The two same users could also start another conversation, which would create another conversation id. – Wan B. May 23 '18 at 12:33
  • can i use these schema for both group as well as one to one chat? – Ashh May 23 '18 at 12:35
  • @Ashish I think it's best if you open a new question describing your use case with examples. – Wan B. May 23 '18 at 12:48
  • 1
    Little late for the party, but worth mentioning; @Maria is right partially. You do need to run 2 queries, as you need to check for "write access" against the `Conversations` collection. Same for reading messages, you need to check `Conversations` for access first. – s.meijer Sep 01 '18 at 09:20
  • @s.meijer no need for 2 read/writes as each message has conversation_id. And for recent messages, a min/max heap can be used. – MANN Jan 31 '19 at 02:20
  • 2
    @MANN, `db.messages.insert({ converstationId: 123, content: 'hi' })`, this needs to be validated/authorized by `db.conversations.find({ id: 123, participants: userId }).count() === 1`. Not every message should store all participants. But you do need to ensure that the insert message action is authorized. – s.meijer Jan 31 '19 at 12:34
  • In every message I have to check (conversation_id, user authenticity), it would be heavy on Database to check for every insert message, any solution ?! – Abd allah Khateeb Oct 11 '22 at 05:13