34

MongoDB uses ObjectId type for _id.

Will it be bad if I make _id an incrementing integer?

(With this gem, if you're interested)

Asya Kamsky
  • 41,784
  • 5
  • 109
  • 133
just so
  • 1,088
  • 2
  • 11
  • 23
  • 1
    It really depends. There is one argument for no because it is a unique id (auto incrementing) but then there is one for yes because of the maintenance overhead required to keep the id unique (having to query the other counter collection). It is like having to check the uniqueness of all _ids before you insert them, it is eventually hamper the rate of inserts and create prolonged lock. – Sammaye Dec 27 '12 at 12:05
  • Hmm, so many actions in DB for this simple feature? =( – just so Dec 27 '12 at 12:07
  • Yea quite a few because of course MongoDB has no sense of a server-side auto incrementing id, you can look here for what it takes to make one: http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/ infact this is one of the reasons why MongoDB does not support this type of id server-side – Sammaye Dec 27 '12 at 12:08
  • 4
    Actually, mongo use `ObjectID` datatype for it's `_id`'s. `ObjectID` is a 12 bytes of binary data and not a strings. See [MongoDB Documentation](http://docs.mongodb.org/manual/core/object-id/) for more info. – Leonid Beschastny Dec 27 '12 at 12:26

3 Answers3

41

No it isn't bad at all and in fact the built in ObjectId is quite sizeable within the index so if you believe you have something better then you are more than welcome to change the default value of the _id field to whatever.

But, and this is a big but, there are some considerations when deciding to move away from the default formulated ObjectId, especially when using the auto incrementing _ids as shown here: https://docs.mongodb.com/v3.0/tutorial/create-an-auto-incrementing-field

Multi threading isn't such a big problem because findAndModify and the atomic locks can actually take care of that, but then you just hit into your first problem. findAndModify is not the fastest function nor the lightest and there have been significant performance drops noticed when using it regularly.

You also have to consider the overhead of doing this yourself anyway, even without findAndModify. For every insert you will need an extra query. Imagine having a unique id that you have to query the uniqueness of every time you want to insert. Eventually your insert rate will drop to a crawl and your lock time will build up.

Of course the ObjectId is really good at being unique without having to check or formulate its own uniqueness by touching the database prior to insertion, hence it doesn't have this overhead.

If you still feel an integer _id suites your scenario, then go for it, but bare in mind the overhead described above.

Jim U
  • 3,318
  • 1
  • 14
  • 24
Sammaye
  • 43,242
  • 7
  • 104
  • 146
  • Just pointing out if you're using mongodb for searching document repos and you've opted for the modern and lightning fast simhash method, using a simhash uint64 for the _id is actually ideal; since you're searching you would only ever ask for the simhash anyway, and since you're using simhash for indexing, you would always be using a "get or create" pattern to add to the index, which negates any negatives and makes your ids 33% smaller and faster to compare. – Nick Steele Feb 26 '20 at 02:46
17

You can do it, but you are responsible to make sure that the integers are unique.

MongoDB doesn't support auto-increment fields like most SQL databases. When you have a distributed or multithreaded application which has multiple processes and/or threads which create new database entries, you have to make sure that they use the same counter. Otherwise it could happen that two threads try to store a document with the same _id in the database.

When that happens, one of them will fail. That means you have to wait for the database to return a success or error (by calling GetLastError or by setting the write concerns to acknowledged), which takes longer than just sending data in a fire-and-forget manner.

Philipp
  • 67,764
  • 9
  • 118
  • 153
1

I had a use case for this: replacing _id with a 64 bit integer that represented a simhash of a document index for searching.

Since I intended to "Get or create", providing the initial simhash, and creating a new record if one didn't exist was perfect. Also, for anyone Googling, MongoDB support explained to me that simhashes are absolutely perfect for sharding and scaling, and even better than the more generic ObjectId, because they will divide up the data across shards perfectly and intrinsically, and you get the key stored for negative space (a uint64 is much smaller than an objectId and would need to be stored anyway).

Also, for you Googlers, replacing a MongoDB _id with something other than an objectId is absolutely simple: Just create an object with the _id being defined; use an integer if you like. That's it: Mongo will simply use it. If you try to create a document with the same _id you'll get an error (E11000/Duplicate key). So like me, if you're using simhashing, this is ideal in all respects.

Nick Steele
  • 7,419
  • 4
  • 36
  • 33