Which mongo document schema/structure is correct?

Question

I have two document formats which I can't decide is the mongo way of doing things. Are the two examples equivalent? The idea is to search by userId and have userId be indexed. It seems to me the performance will be equal for either schemas.

multiple bookmarks as separate documents in a collection:

{
  userId: 123,
  bookmarkName: "google",
  bookmarkUrl: "www.google.com"
},
{
  userId: 123,
  bookmarkName: "yahoo",
  bookmarkUrl: "www.yahoo.com"
},
{
  userId: 456,
  bookmarkName: "google",
  bookmarkUrl: "www.google.com"
}

multiple bookmarks within one document per user.

{
  userId: 123,
  bookmarks:[
    {
      bookmarkName: "google",
      bookmarkUrl: "www.google.com"
    },
    {
      bookmarkName: "yahoo",
      bookmarkUrl: "www.yahoo.com"
    }
  ]
},
{
  userId: 456,
  bookmarks:[
    {
      bookmarkName: "google",
      bookmarkUrl: "www.google.com"
    }
  ]
}

score 1 · Answer 1 · answered Sep 15 '14 at 15:01

The problem with the second option is that it causes growing documents. Growing documents are bad for write performance, because the database will have to constantly move them around the database files.

To improve write performance, MongoDB always writes each document as a consecutive sequence to the database files with little padding between each document. When a document is changed and the change results in the document growing beyond the current padding, the document needs to be deleted and moved to the end of the current file. This is a quite slow operation.

Also, MongoDB has a hardcoded limit of 16MB per document (mostly to discourage growing documents). In your illustrated use-case this might not be a problem, but I assume that this is just a simplified example and your actual data will have a lot more fields per bookmark entry. When you store a lot of meta-data with each entry, that 16MB limit could become a problem.

So I would recommend you to pick the first option.

score 0 · Answer 2 · answered Sep 15 '14 at 14:52

0

I would go with the option 2 - multiple bookmarks within one document per user because this schema would take advantage of MongoDB’s rich documents also known as “denormalized” models.

Embedded data models allow applications to store related pieces of information in the same database record. As a result, applications may need to issue fewer queries and updates to complete common operations. Link

answered Sep 15 '14 at 14:52

gpullen

1,093
2
14
28

To be honest you can only really find out by benchmarking this with the amount of data that you'd be using in production – gpullen Sep 15 '14 at 15:10

score 0 · Answer 3 · edited May 23 '17 at 12:29

There are two tools that allow applications to represent these relationships: references and embedded documents.

When designing data models, always consider the application usage of the data (i.e. queries, updates, and processing of the data) as well as the inherent structure of the data itself.

The Second type of structure represents an Embedded type.

Generally Embedded type structure should be chosen when our application needs:

a) better performance for read operations.
b) the ability to request and retrieve 
   related data in a single database operation. 
c) Data Consistency, to update related data in a single atomic write operation.

In MongoDB, operations are atomic at the document level. No single write operation can change more than one document. Operations that modify more than a single document in a collection still operate on one document at a time. Ensure that your application stores all fields with atomic dependency requirements in the same document. If the application can tolerate non-atomic updates for two pieces of data, you can store these data in separate documents. A data model that embeds related data in a single document facilitates these kinds of atomic operations.

d) to issue fewer queries and updates to complete common operations.

When not to choose:

Embedding related data in documents may lead to situations where documents grow after creation. Document growth can impact write performance and lead to data fragmentation. (limit of 16MB per document)

Now let's compare the structures from a developer's perspective:

Say I want to see all the bookmarks of a particular user:

The first type would require an aggregation to be applied on all the documents. minimum set of functions that would be required to get the aggregated results, $match,$group(with $push operator):

db.collection.aggregate([{$match:{"userId":123}},{$group:{"_id":"$userId","bookmarkNames":{$push:"$bookmarkName"},"bookMarkUrls:{$push:"$bookmarkUrl"}"}}])

or a find() which returns multiple documents to be iterated.

Wheras the Embedded type would allow us to fetch it using a $match in the find query.

 db.collection.find({"userId":123});

This just indicates the added overhead from the developer's point of view. We would view the first type as an unwinded form of the embedded document.

The first type, multiple bookmarks as separate documents in a collection, is normally used in case of logging. Where the log entries are huge and will have a TTL, time to live. The collections in that case, would be capped collections. Where documents would be automatically deleted after a particular period of time.

Bottomline, if your documents size would not grow beyond 16 MB at any particular time opt for the Embedded type. it would save developing effort as well.

See Also: MongoDB relationships: embed or reference?

Which mongo document schema/structure is correct?

3 Answers3