64

I'm a little confused by Mongo DB's use of ObjectIds. Sure they're great for creating IDs client-side that almost definitely don't conflict with other client-side created ids. But mongo seems to store them in some special way. Storing a string representation of the id is different from storing an object id as an object. Why is this?

Doesn't the string form have all the same information that the object form has? Why does mongo go to such lengths to differentiate those two forms? It screws me up when I try to compare _ids sent from a frontend for example. My database is in no way consistent with whether it stores string-form ids or object-form ids, and tho my code is certainly partially to blame, I mostly blame mongo for making this so weird.

Am I wrong that this is weird? Why does mongo do it this way?

B T
  • 57,525
  • 34
  • 189
  • 207

2 Answers2

46

I convert to string in code to compare and I ensure that anything that looks like an ObjectId is actually used as a ObjectId.

It is good to note that between the ObjectId (http://docs.mongodb.org/manual/reference/object-id/) and it's hex representation there is in fact 12 bytes of difference, the ObjectId being 12 bytes and it's hex representation being 24.

Not only is it about storage efficiency but also about indexes; not just because they are smaller but also since the ObjectId can be used in a special manner to ensure that only parts of the index are loaded; the parts that are used. This becomes most noticeable when inserting, where only the latest part of that index needs to be loaded in to ensure uniqueness. You cannot guarantee such behaviour with its hex representation.

I would strongly recommend you do not use the ObjectId's hex representation. If you want to "make your life easier" you would be better off creating a different _id which is smaller but somehow just as unique and index friendly.

Just a guy
  • 5,812
  • 4
  • 21
  • 25
Sammaye
  • 43,242
  • 7
  • 104
  • 146
  • 2
    You shouldn't need to convert ObjectIds to strings to compare them. They do have an equals method (at least in mongo native / mongoskin). – B T Jan 12 '15 at 20:43
  • The reference you have says that ObjectIds are 12 bytes, not 16 – B T Jan 12 '15 at 20:44
  • @BT whops my bad, must have been thinking of something else – Sammaye Jan 12 '15 at 20:50
  • @BT if the framework supports that then that brilliant, I use PHP personally – Sammaye Jan 12 '15 at 20:51
  • According to https://stackoverflow.com/a/27274609/6440033, the string representation of an ObjectId should be 29 bytes (24 * UTF8-encoded chars of a low range so single-byte each + 4 bytes length + 1 byte string termination char) – dnickless Oct 10 '19 at 10:35
  • @Sammaye does that mean PHP doesn't support ObjectIds? Just seems odd to me that NodeJs would support ObjectIds but not PHP. – Tom Sep 17 '21 at 12:33
  • 1
    @Tom this answer is like 6 years old but neither "support" ObjectId but both have a helper which the driver wraps the `_id` field in which can provide certain functions, I believe I was referring to the difference in implementation between mongoose and PHP framework ORMs, however, since then a whole new PHP driver has been released. – Sammaye Sep 17 '21 at 12:54
9

ObjectId is 12 bytes when it's stored internally, which is more compact than the hexadecimal string representation of it. The two are different things.

You can reflow your whole DB and use a uniform _id field to solve this and make sure your code saves in the same format. ObjectId are fast to generate by MongoDB so I'd use that when creating new documents.

bakkal
  • 54,350
  • 12
  • 131
  • 107
  • Ah so its all about storage efficiency then? All my _ids are ObjectIds, but lots of reference fields aren't right now, I suppose I should fix things up – B T Jan 12 '15 at 07:28
  • 1
    Field size also might affect the indexing/querying performance (not sure by how much for `ObjectId` vs `ObjectId.str`, but MongoDB was probably built/tested with ObjectId so let's use that :D) – bakkal Jan 12 '15 at 07:35
  • 1
    This is actually pretty annoying since there's no real good way to serialize object ids. Any requests that come from the client have to be explicitly combed to replace string ids with object ids. Is there a way around that? – B T Jan 12 '15 at 07:41
  • Depends on your web stack/framework since it involves the client, so I'd post a separate question with the details – bakkal Jan 12 '15 at 07:48
  • 1
    Theoretically, couldn't mongo (client or server) detect when a string is a valid ObjectId hex value and automatically convert it? Then the user wouldn't need to differentiate, and the only overhead would be a minor deserialization/serialization cost – B T Jan 12 '15 at 20:46
  • 1
    That opens you to accidental conversion of a random string that by mere chance passes as a valid ObjectId when you actually planned for `_id` to be a string (which MongoDB allows for as a use case). You're taking this way overboard. When you use primary keys as a sequential integer in MySQL or PostgreSQL, you put ints like 12345 you don't put "12345" as a string. MongoDB cannot guess your use case since you don't define a schema like you do in a RDBMS, don't ask for auto conversion. RDBMS's can because you pre-define a schema. On MongoDB these things are left to your stack (e.g. web/client) – bakkal Jan 12 '15 at 21:14
  • 1
    This can apply to other fields/types too not just _id, e.g. dates, strings and dates (like `ISODate`) are not to be confounded. These responsibilities usually fall into your upper stack, e.g. the REST interface that has some knowledge of what to expect (e.g. implicitly via a schema definition or explicit conversions depending on your coding style) – bakkal Jan 12 '15 at 21:28
  • As long as mongo converted all objectids back to strings, there'd be no problem with "accidental" conversion of random strings. But point taken – B T Jan 12 '15 at 21:33
  • I think `mongoose` for `node.js` converts these strings automatically at least in some cases (like when searching by `_id`, it will convert your string to an ObjectId for you). Question: Let's imagine I have a field on some schema and lets say it's called `author_id`, which corresponds to the user who wrote that post or review or something. Is it best to make that field an `ObjectId`, or a `String` type? (Please consider it will also be an index). – Will Brickner May 04 '17 at 04:21
  • (I use mongoose which uses schemas, not sure if it's relevant here) – Will Brickner May 04 '17 at 04:22