Understanding MongoDB BSON Document size limit

Question

From MongoDB The Definitive Guide:

Documents larger than 4MB (when converted to BSON) cannot be saved to the database. This is a somewhat arbitrary limit (and may be raised in the future); it is mostly to prevent bad schema design and ensure consistent performance.

I don't understand this limit, does this mean that A Document containing a Blog post with a lot of comments which just so happens to be larger than 4MB cannot be stored as a single document?

Also does this count the nested documents too?

What if I wanted a document which audits the changes to a value. (It will eventually may grow, exceeding 4MB limit.)

Hope someone explains this correctly.

I have just started reading about MongoDB (first nosql database I'm learning about).

Thank you.

I think the question should clarify that this is a limitation of the MongoDB stored document sizes and not of the BSON format. — alexpopescu, Jan 12 '11 at 14:03
Though, I just tried saving a huge document that most certainly exceeds 4MB to get the message "BSON::InvalidDocument: Document too large: BSON documents are limited to 4194304 bytes." If that's the case, isn't it kind of misleading in the warning/error message? — Nik So, Feb 24 '11 at 19:21
You can easily find your max BSON document size with `db.isMaster().maxBsonObjectSize/(1024*1024)+' MB'` command in `mongo` shell. — ahmet alp balkan, Oct 28 '11 at 16:39
what is the purpose of schemaless nosql where you cannot dump records more than 16 mb and built crud operation on top of it ! — Rizwan Patel, Aug 19 '16 at 11:17
I think the initial quote says it all... The limit is in place to prevent bad schema design. If, for instance you have a post with many comments, you would want a blog entry collection and a comment collection, or a changes collection. The design of mongo/nosql allows for massively-sized things as networks of documents, but the developer needs to break them into parts that make sense. If no size limit is set, other problems will happen. I think the 4mb limit was fine. 16mb, great! But if I'm writing a 16mb document, that is a clue that something else is wrong with the design. — Eyelash, Apr 09 '18 at 12:27

Justin Jenkins · Accepted Answer · 2016-08-19T18:18:39.007

138

First off, this actually is being raised in the next version to 8MB or 16MB ... but I think to put this into perspective, Eliot from 10gen (who developed MongoDB) puts it best:

EDIT: The size has been officially 'raised' to 16MB

So, on your blog example, 4MB is actually a whole lot.. For example, the full uncompresses text of "War of the Worlds" is only 364k (html): http://www.gutenberg.org/etext/36

If your blog post is that long with that many comments, I for one am not going to read it :)

For trackbacks, if you dedicated 1MB to them, you could easily have more than 10k (probably closer to 20k)

So except for truly bizarre situations, it'll work great. And in the exception case or spam, I really don't think you'd want a 20mb object anyway. I think capping trackbacks as 15k or so makes a lot of sense no matter what for performance. Or at least special casing if it ever happens.

-Eliot

I think you'd be pretty hard pressed to reach the limit ... and over time, if you upgrade ... you'll have to worry less and less.

The main point of the limit is so you don't use up all the RAM on your server (as you need to load all MBs of the document into RAM when you query it.)

So the limit is some % of normal usable RAM on a common system ... which will keep growing year on year.

Note on Storing Files in MongoDB

If you need to store documents (or files) larger than 16MB you can use the GridFS API which will automatically break up the data into segments and stream them back to you (thus avoiding the issue with size limits/RAM.)

Instead of storing a file in a single document, GridFS divides the file into parts, or chunks, and stores each chunk as a separate document.

GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.

You can use this method to store images, files, videos, etc in the database much as you might in a SQL database. I have used this to even store multi gigabyte video files.

edited Aug 19 '16 at 18:18

answered Jan 12 '11 at 10:31

Justin Jenkins

26,590
6
68
1,285

I don't really understand "The main point of the limit is so you don't use up all the RAM on your server". We keep our entire MongoDB database in RAM so is this still a concern? – Sean Bannister Dec 10 '11 at 14:03
4

That's awesome you have enough RAM for your entire database ... Typically the "working set" is in RAM, not the whole database (like in my case I have more than one x GBs databases where if all added up would exceed my RAM, but that's okay because the working set is much, much smaller.) Also, if there was no limit you might load a 800MB doc into RAM w/ one query and a 400k doc with another, making balancing your RAM a little difficult, and etc. So the "limit" is some % of typical server RAM (thus it grows over time.) http://www.mongodb.org/display/DOCS/Checking+Server+Memory+Usage – Justin Jenkins Dec 12 '11 at 06:46
3

It's great that you can store everything in RAM, but consider efficiency and the blog post idiom. You obviously want a post to be in memory if its read. But do you really want 10 pages of comments for a blog post to be in memory when most people will never read past the first page? Sure, you can do it and if your database is small enough that it can all fit in memory, then no problem. But in terms of pure efficiency, you do not want useless bits to take up memory space if you can avoid it (and that goes for RDBMS as well). – AlexGad Dec 24 '11 at 16:52
62

sweet jesus, so Mongo's argument is "16 MB should be enough for anybody"? Its not like that has ever proven to be incorrect in the past. – Robert Christ Aug 28 '14 at 17:21
1

It would be nice if there was an example of how to deal with a situation like this. It is conceivable that one could easily exceed 16MB if files (e.g. images, etc) were also allowed within an application's comments. – Isius Nov 10 '15 at 19:43
2

This seems too bad for me. Mongo is supposed to be useful for big data, not have such limitations. In my project, I need to aggregate and group tweets that are related to the same trending topic, and this might end up in more than 20000 tweets for a time period of 20 hours (and it's quite possible that there will be trends durating more than 20 hours in my db). Having that many tweets and storing their text at the same time is devastating and after grouping a few small trends, it ends up with exception on a big trend. – Savvas Parastatidis Jan 24 '16 at 12:47
1

@Savvas is right what is purpose of 16 mb for nosql schemaless system.we understand why 16 mb limit is there but from real world scenario alternative is, we distribute data across various documents and apply linkage for crud operation like you know in conventional rdbms system! – Rizwan Patel Aug 19 '16 at 11:20
12

@savvas why would you put all the tweets in single document? Use one document per tweet, put the trending topic as another field on the document. put an index on that topic field and then aggregate on that field using the mongo pipeline. it take some adjusting of how you do things to work with nosql, once you adjust your methods and thinking you'll find it works great for many big data use cases. – schmidlop Sep 22 '16 at 14:51
2

@schmidlop I don't quite remember right now, but I think that's what I did. But when I aggregated on the topic field, it created a single document for each key, and ended up with huge documents for the biggest topics. Anyway, that was last year and I can barely remember my implementation :P – Savvas Parastatidis Sep 23 '16 at 08:26
is it possible to increase it to more than 16MB? – Diego Ramos Jan 29 '21 at 17:00
1

@DiegoRamos no, this limit is imposed by the MongoDB code – Aryan Beezadhur Feb 06 '21 at 21:20

score 38 · Answer 2 · answered Jul 10 '12 at 19:47

38

Many in the community would prefer no limit with warnings about performance, see this comment for a well reasoned argument: https://jira.mongodb.org/browse/SERVER-431?focusedCommentId=22283&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-22283

My take, the lead developers are stubborn about this issue because they decided it was an important "feature" early on. They're not going to change it anytime soon because their feelings are hurt that anyone questioned it. Another example of personality and politics detracting from a product in open source communities but this is not really a crippling issue.

answered Jul 10 '12 at 19:47

marr75

5,666
1
27
41

6

I totally agree with you, also it defeats the purpose of having embedded documents now, as most embedded documents will now cross the limit easily. Esp with array of documents inside them – Sharjeel Ahmed Feb 15 '16 at 08:43
@marr75 it says fixed now, has it been fixed? – Mafii Apr 27 '16 at 07:11
1

I mean, the limit was raised to 16MB, that doesn't fix the "issue" long term; IMO the limit should just be eliminated. – marr75 Jun 02 '16 at 18:56
2

6 year old thread necro. I am firmly unconvinced by your specific bad use case/design example. Also, that example is much better at illustrating why you need to validate inputs than have a database single document size limit. Making the application split its nested documents as individual documents in another collection or start a new "continuation" document (solutions I have used several times to work within this limit) had little impact on performance but big impacts on code complexity. The entire point of document DBs is data locality. – marr75 May 24 '18 at 18:46
Adding an additional 2¢, a limitation like this does not, in fact, "defeat the purpose" of embedded documents. My gaming forums, for example, store all replies to a thread within the thread. To exceed the current 16MB limit would require the community collectively write a novel containing more than 500 chapters within a single thread—this _will not happen_. (6.5 average bytes per word, 5K word chapter length.) – amcgregor Mar 26 '19 at 16:03
6

Thanks for doing about the same math the mongoDB documents do to defend this decision, but your single use case and thought experiment is far from conclusive. I have had to come up with complex, redundant designs to workaround the fact that there is an arbitrary limit that does get hit by mongo (without deeply nested or duplicated entries, btw). By your logic, no database should need to contain more than 16MB total because some arbitrary text can be represented using less storage. This is obviously silly. – marr75 Jun 15 '19 at 20:42
@marr75 So obviously silly and barely comprehensible, I'm glad it's not even close to what I said. Your logical fallacy the superior argument makes. (Yearly bookmark cleanup, glad I ran across this again.) – amcgregor Jan 31 '21 at 16:39

score 37 · Answer 3 · edited Feb 07 '21 at 09:53

37

To post a clarification answer here for those who get directed here by Google.

The document size includes everything in the document including the subdocuments, nested objects etc.

So a document of:

{
  "_id": {},
  "na": [1, 2, 3],
  "naa": [
    { "w": 1, "v": 2, "b": [1, 2, 3] },
    { "w": 5, "b": 2, "h": [{ "d": 5, "g": 7 }, {}] }
  ]
}

Has a maximum size of 16 MB.

Subdocuments and nested objects are all counted towards the size of the document.

edited Feb 07 '21 at 09:53

Aryan Beezadhur

4,503
4
21
42

answered Oct 16 '13 at 11:08

Sammaye

43,242
7
104
146

1

The single largest possible structure able to be represented in BSON is, ironically, also the most compact. Despite the fact that MongoDB uses `size_t` (64-bit) array indexes internally, the 16MB document size limit would, at best, be able to represent a document containing a single array itself containing two million NULLs. – amcgregor Mar 26 '19 at 16:14
1

Apologies, adding second comment to address/clarify another important detail: when you say _document size includes everything in the document_, that also includes the _keys_. E.g. `{"f": 1}` is two bytes smaller than `{"foo": 1}`. This can rapidly add up if you aren't careful, though modern on-disk compression does help. – amcgregor Mar 26 '19 at 16:17

score 6 · Answer 4 · answered Jun 20 '13 at 21:07

I have not yet seen a problem with the limit that did not involve large files stored within the document itself. There are already a variety of databases which are very efficient at storing/retrieving large files; they are called operating systems. The database exists as a layer over the operating system. If you are using a NoSQL solution for performance reasons, why would you want to add additional processing overhead to the access of your data by putting the DB layer between your application and your data?

JSON is a text format. So, if you are accessing your data through JSON, this is especially true if you have binary files because they have to be encoded in uuencode, hexadecimal, or Base 64. The conversion path might look like

binary file <> JSON (encoded) <> BSON (encoded)

It would be more efficient to put the path (URL) to the data file in your document and keep the data itself in binary.

If you really want to keep these files of unknown length in your DB, then you would probably be better off putting these in GridFS and not risking killing your concurrency when the large files are accessed.

"There are already a variety of databases which are very efficient at storing/retrieving large files; they are called operating systems."; See http://blog.mongodb.org/post/183689081/storing-large-objects-and-files-in-mongodb — redcalx, Jul 13 '15 at 12:01

score 6 · Answer 5 · answered Apr 17 '16 at 05:14

6

Nested Depth for BSON Documents: MongoDB supports no more than 100 levels of nesting for BSON documents.

More more info vist

answered Apr 17 '16 at 05:14

user2903536

1,716
18
25

score 2 · Answer 6 · answered Apr 24 '19 at 03:10

According to https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

If you expect that a blog post may exceed the 16Mb document limit, you should extract the comments into a separate collection and reference the blog post from the comment and do an application-level join.

// posts
[
  {
    _id: ObjectID('AAAA'),
    text: 'a post',
    ...
  }
]

// comments
[
  {
    text: 'a comment'
    post: ObjectID('AAAA')
  },
  {
    text: 'another comment'
    post: ObjectID('AAAA')
  }
]

Mchl · Answer 7 · 2011-01-12T10:44:58.840

1

Perhaps storing a blog post -> comments relation in a non-relational database is not really the best design.

You should probably store comments in a separate collection to blog posts anyway.

[edit]

See comments below for further discussion.

edited Jan 12 '11 at 10:44

answered Jan 12 '11 at 10:25

Mchl

61,444
9
118
120

Don't know about the best design at this early stage of the experience. The book gives a little example of a blog. Hence the thought. Thanks. – 0xdeadbeef Jan 12 '11 at 10:30
15

I don't agree at all. Comments in your blog post documents should be perfectly fine in MongoDB ... it's a very common use (I use it more than one place in production and it works quite well.) – Justin Jenkins Jan 12 '11 at 10:34
@Justin Jenkins: I agree with you but it really depends from site. So i belive that for sites like stackoverflow need to create separate document for comment. – Andrew Orsich Jan 12 '11 at 10:38
@Bugai13 ... sure for SO, but 98% of sites aren't anywhere close to that :) that's all I'm saying. For most sites it will work great (and possibly actually HELP more than it could hurt.) That said, depending on your scale storing comments in a separate collection might be the only option. – Justin Jenkins Jan 12 '11 at 10:41
2

I was perhaps overly strict in my answer. There's nothing wrong in storing blog posts and associated comments in MongoDB or similar database. It's more that people tend to overuse the abilities document based databases give (most radical example would be to store all your data in a single document called 'blog') – Mchl Jan 12 '11 at 10:44
@Justin: I agree again... But one more thing. Design of document db also depend from a site design, because if you need to display listing of threads without comments(like list of questions at stackoverflow) for sure need to create separate documents for blog and comments. – Andrew Orsich Jan 12 '11 at 10:46
@Mchl totally agree on the abuse! A "blog" document would be insane. :) – Justin Jenkins Jan 12 '11 at 10:47
@Bugai13 I might not totally understand what you are saying but I'd think it's actually a pretty simple MongoDB query to: "display listing of threads without comments." That said, I think I see your point. – Justin Jenkins Jan 12 '11 at 10:54
3

@Mchel: "blog" isn't good, but storing comments in a separate collection is just as bad for the same reasons. Posts with a comments array is like, the cannonical example of a document db. – Matt Briggs Jan 12 '11 at 14:51
7

@SoPeople: storing comments within a post is like the canonical example of Document-oriented DBs. (like storing the entirety of a wiki text inside of one document) If I were to write SO it would run completely on MongoDB. None of these SO entries is going to *reasonably* exceed 4MB. Craigslist is doing a giant DB migration of their history to MongoDB. They only had a couple of docs go over that limit and the lead developer suggested that the docs themselves were actually busted (the result of some bugs). Again, 4 megs is several novels of text. – Gates VP Jan 12 '11 at 23:21
@Gates VP, what about tag searching? Retrieving the search results would be reasonably fast for SO, but for a site that has lots of large documents, you might need to load and transfer megabytes of data. – mikerobi Jun 07 '11 at 14:42
1

@mikerobi: that 4MB (now 16MB) limit is for a single document. Remember that in MongoDB a "document" is roughly equivalent to a "row". If you have large binary objects, then you should take a look at GridFS for storing these objects. If you need to search through large corpuses of straight text AND this text exceeds 4MB, then MongoDB is not the correct tool (nor are most DBs). To search through a large amount of text please look to SOLR or Sphinx. – Gates VP Jun 07 '11 at 15:26
3

@Gates VP, I agree about using a separate full text engine. I was thinking about a metadata search. What if you have a set of Book documents, and you want to find all books published in 1982? If each book has +100kb of text, you don't want to transfer several megabytes just to display the first 20 book titles. – mikerobi Jun 07 '11 at 16:43

Understanding MongoDB BSON Document size limit

7 Answers7

Linked

Related