154

I'm curious as to the pros and cons of using subdocuments vs a deeper layer in my main schema:

var subDoc = new Schema({
  name: String
});

var mainDoc = new Schema({
  names: [subDoc]
});

or

var mainDoc = new Schema({
  names: [{
    name: String
 }]
});

I'm currently using subdocs everywhere but I am wondering primarily about performance or querying issues I might encounter.

Ates Goral
  • 137,716
  • 26
  • 137
  • 190
cyberwombat
  • 38,105
  • 35
  • 175
  • 251
  • I was trying to type in a answer this to you, but I couldn't find how. But give a look at here: http://mongoosejs.com/docs/subdocs.html – gustavohenke Mar 04 '13 at 20:26
  • Here is a good response about MongoDB considerations to ask yourself when creating your database schema: http://stackoverflow.com/questions/5373198/a-simple-mongodb-question-embed-or-reference – anthonylawson Mar 04 '13 at 20:41
  • You meant that it's required to also describe the `_id` field? I mean, it's not kinda automatic if it's enabled? – Vadorequest Feb 08 '14 at 19:02
  • anyone know if the ```_id``` field of subdocuments are unique? (created using 2nd way in OP's question) – Saitama Jan 16 '18 at 15:39

6 Answers6

90

According to the docs, it's exactly the same. However, using a Schema would add an _id field as well (as long as you don't have that disabled), and presumably uses some more resources for tracking subdocs.

Alternate declaration syntax

New in v3 If you don't need access to the sub-document schema instance, you may also declare sub-docs by simply passing an object literal [...]

Community
  • 1
  • 1
AndyL
  • 1,615
  • 1
  • 13
  • 15
  • 1
    But I tried this. Why the sub documents data is not stored in separate collection. It always store inside the mainDoc collection. – Fizer Khan May 27 '13 at 11:13
  • 21
    that's how sub documents work. they are embedding inside of a document. before playing with mongoose, make sure you understand the underlying MongoDB. – AndyL May 31 '13 at 17:06
  • 1
    Regarding the Schema adding _id, that makes sense but I created a schema with an array of sub-docs and an array of object literals and an _id was added to both. Has the behavior changed? – Drew Goodwin May 03 '16 at 17:12
  • @DrewGoodwin seems like it's been like this for a while: http://stackoverflow.com/questions/17254008/stop-mongoose-from-creating-id-property-for-sub-document-array-items – cheesemacfly May 10 '16 at 18:28
  • @DrewGoodwin yes, mongoose automatically created a schema for object literals declared within an array. https://mongoosejs.com/docs/subdocs.html#altsyntaxarrays – Kautilya Kondragunta Feb 24 '21 at 04:43
47

If you have schemas that are re-used in various parts of your model, then it might be useful to define individual schemas for the child docs so you don't have to duplicate yourself.

sonstone
  • 697
  • 6
  • 9
33

You should use embedded documents if that are static documents or that are not more than a few hundred because of performance impact. I have gone through about that issue for a while ago. Newly, Asya Kamsky who works as a solutions architect for MongoDB had written an article about "using subdocuments".

I hope that helps to who is looking for solutions or the best practice.

Original post on http://askasya.com/post/largeembeddedarrays . You can reach her stackoverflow profile on https://stackoverflow.com/users/431012/asya-kamsky

First of all, we have to consider why we would want to do such a thing. Normally, I would advise people to embed things that they always want to get back when they are fetching this document. The flip side of this is that you don't want to embed things in the document that you don't want to get back with it.

If you embed activity I perform into the document, it'll work great at first because all of my activity is right there and with a single read you can get back everything you might want to show me: "you recently clicked on this and here are your last two comments" but what happens after six months go by and I don't care about things I did a long time ago and you don't want to show them to me unless I specifically go to look for some old activity?

First, you'll end up returning bigger and bigger document and caring about smaller and smaller portion of it. But you can use projection to only return some of the array, the real pain is that the document on disk will get bigger and it will still all be read even if you're only going to return part of it to the end user, but since my activity is not going to stop as long as I'm active, the document will continue growing and growing.

The most obvious problem with this is eventually you'll hit the 16MB document limit, but that's not at all what you should be concerned about. A document that continuously grows will incur higher and higher cost every time it has to get relocated on disk, and even if you take steps to mitigate the effects of fragmentation, your writes will overall be unnecessarily long, impacting overall performance of your entire application.

There is one more thing that you can do that will completely kill your application's performance and that's to index this ever-increasing array. What that means is that every single time the document with this array is relocated, the number of index entries that need to be updated is directly proportional to the number of indexed values in that document, and the bigger the array, the larger that number will be.

I don't want this to scare you from using arrays when they are a good fit for the data model - they are a powerful feature of the document database data model, but like all powerful tools, it needs to be used in the right circumstances and it should be used with care.

Community
  • 1
  • 1
efkan
  • 12,991
  • 6
  • 73
  • 106
  • 3
    This should be the top answer; it's bang on the money. MongoDB's own white papers say pretty much the same thing. – Jay Edwards Sep 13 '17 at 21:22
  • This article about the Bucket Pattern compliments what Asya talks about nicely. https://www.mongodb.com/blog/post/building-with-patterns-the-bucket-pattern I think the subDoc schema in OP's question would work well with the Bucket Pattern. – plong0 Apr 03 '19 at 05:35
  • A few hundred whats? – Aaron Franke Aug 23 '22 at 00:57
23

Basically, create a variable nestedDov and put it here name: [nestedDov]

Simple Version:

var nestedDoc = new Schema({
  name: String
});

var mainDoc = new Schema({
  names: [nestedDoc]
});

JSON Example

{
    "_id" : ObjectId("57c88bf5818e70007dc72e85"),
    "name" : "Corinthia Hotel Budapest",
    "stars" : 5,
    "description" : "The 5-star Corinthia Hotel Budapest on the Grand Boulevard offers free access to its Royal Spa",
    "photos" : [
        "/photos/hotel/corinthiahotelbudapest/1.jpg",
        "/photos/hotel/corinthiahotelbudapest/2.jpg"
    ],
    "currency" : "HUF",
    "rooms" : [
        {
            "type" : "Superior Double or Twin Room",
            "number" : 20,
            "description" : "These are some great rooms",
            "photos" : [
                "/photos/room/corinthiahotelbudapest/2.jpg",
                "/photos/room/corinthiahotelbudapest/5.jpg"
            ],
            "price" : 73000
        },
        {
            "type" : "Deluxe Double Room",
            "number" : 50,
            "description" : "These are amazing rooms",
            "photos" : [
                "/photos/room/corinthiahotelbudapest/4.jpg",
                "/photos/room/corinthiahotelbudapest/6.jpg"
            ],
            "price" : 92000
        },
        {
            "type" : "Executive Double Room",
            "number" : 25,
            "description" : "These are amazing rooms",
            "photos" : [
                "/photos/room/corinthiahotelbudapest/4.jpg",
                "/photos/room/corinthiahotelbudapest/6.jpg"
            ],
            "price" : 112000
        }
    ],
    "reviews" : [
        {
            "name" : "Tamas",
            "id" : "/user/tamas.json",
            "review" : "Great hotel",
            "rating" : 4
        }
    ],
    "services" : [
        "Room service",
        "Airport shuttle (surcharge)",
        "24-hour front desk",
        "Currency exchange",
        "Tour desk"
    ]
}

Example:

enter image description here

Wayne Chiu
  • 5,830
  • 2
  • 22
  • 19
  • 5
    That doesn't address the question at all which is one of performance. – cyberwombat Sep 02 '16 at 16:34
  • I have edited a bit in order to make more sense. What do you think? – Wayne Chiu Sep 02 '16 at 20:32
  • 5
    The question is not asking how to do nested schemas. Its a discussion on whether Mongoose is more performant with nested schemas or embedded sub documents. Basically we are talking benchmarks or sorts or edge cases where Mongoose prefers one to the other. And as the selected answer mentions it doesn't appear to make any difference, at least from V3 on. – cyberwombat Sep 02 '16 at 21:50
  • 20
    Maybe doesn't work for the OP, but I found this very helpful. Thanks. – Gene Higgins Nov 12 '16 at 17:44
  • This is good when all 3 schemas are declared in one .js file, how can we handle it when declared in 3 different .js files? – Satyam Feb 07 '19 at 15:47
  • any idea why for me it returned an empty array? to me, your example looks kind of identical to mine: https://stackoverflow.com/q/66292912/8780756 – enkicoma Feb 20 '21 at 16:00
10

I think this is handled elsewhere by multiple post on SO.

Just a few:

The big key is that there is no single answer here, only a set of rather complex trade-offs.

Community
  • 1
  • 1
Gates VP
  • 44,957
  • 11
  • 105
  • 108
  • 4
    Perhaps I am not phrasing my question correctly - This is not a question of how I should structure my database but rather the internals of using a subschema vs just writing the array in a deeper layer. My primary cause for using a subschema is that I can make use of custom schema types and have them validate - something that doesn't work with nested arrays (from a previous question I had on SO). As near as I can tell a subdoc is pretty much the same as a nested array - I just don't know the internals of it - if using them would create performance issues or such. – cyberwombat Mar 04 '13 at 23:44
2

There are some difference between the two:

  • Using nested schema is helpful for validation.

  • Nested schema can be reused in other schemas.

  • Nested schema add '_id' field to the subdocument unless you used "_id:false"
Ahmad Zahabi
  • 1,110
  • 12
  • 15