65

It seems to me that when you are creating a Mongo document and have a field {key: value} which is sometimes not going to have a value, you have two options:

  1. Write {key: null} i.e. write null value in the field
  2. Don't store the key in that document at all

Both options are easily queryable, in one you query for {key : null} and the other you query for {key : {$exists : false}}.

I can't really think of any differences between the two options that would have any impact in an application scenario (except that option 2 has slightly less storage).

Can anyone tell me if there are any reasons one would prefer either of the two approaches over the other, and why?

EDIT

After asking the question it also occurred to me that indexes may behave differently in the two cases i.e. a sparse index can be created for option 2.

Hendy Irawan
  • 20,498
  • 11
  • 103
  • 114
Zaid Masud
  • 13,225
  • 9
  • 67
  • 88
  • 3
    For what it's worth, "slightly less storage" can be more significant at huge scale, particularly when you're talking the size of working sets in memory. – Christopher Sep 13 '12 at 14:54

5 Answers5

42

Indeed you have also a third possibility : key: "" (empty value)

And you forget a specificity about null value. Query on key: null will retrieve you all document where key is null or where key doesn't exist.

When a query on $exists:false will retrieve only doc where field key doesn't exist.

To go back to your exact question it depends of you queries and what data represent. If you need to keep that, by example, a user set a value then unset it, you should keep the field as null or empty. If you dont need, you may remove this field.

Aurélien B
  • 4,590
  • 3
  • 34
  • 48
  • 17
    I will stay away from key: " " personally since this is implying that it is a string. If you retrieve such a field and you do something like isKeyExists(key) this will pass as true even though it's an empty string. So let's say you were expecting a boolean that room for errors. – Sani Yusuf Jan 20 '17 at 17:30
  • You have even a forth possibility : `key: undefined`. Query would be different, see https://stackoverflow.com/questions/68255363/or-with-if-and-in-mongodb/68255564#68255564 – Wernfried Domscheit Nov 13 '22 at 11:24
21

Note that, since MongoDB doesnt use field name dictionary compression, field:null consumes disk space and RAM, while storing no key at all doesnt consume resources.

Samuel García
  • 2,199
  • 14
  • 21
10

It really comes down to:

  • Your scenario
  • Your querying manner
  • Your index needs
  • Your language

I personally have chosen to store null keys. It makes it much easier to integrate into my app. I use PHP with Active Record and uisng null values makes my life a lot easier since I am not having to put the stress of field depedancy upon the app. Also I do not need to make any complex code to deal with magics to set non-existant variables.

I personally would not store an empty value like "" since if your not careful you could have two empty values null and "" and then you'll have a hap-hazard time of querying specifically. So I personally prefer null for empty values.

As for space and index: it depends on how many rows might not have this colum but I doubt you will really notice the index size increase due to a few extra docs with null in. I mean the difference in storage is mineute especially if the corresponding key name is small as well. That goes for large setups too.

I am quite frankly unsure of the index usage between $exists and null however null could be a more standardised method by which to query the existance since remember that MongoDB is schemaless which means you have no requirement to have that field in the doc which again produces two empty values: non-existant and null. So better to choose one or the other.

I choose null.

Sammaye
  • 43,242
  • 7
  • 104
  • 146
  • Thanks... but I'm not convinced that using null makes the app code any easier (at least not in C# driver that I'm using). There is no complex code as non-existent keys automatically deserialize to null in the C# driver. – Zaid Masud Sep 14 '12 at 10:57
  • @ZaidMasud Yes in C# as well as driver issues you have the strongly typed language. This is why this is such a hard question to answer since there is no standard across all platforms, unlike CSS or HTML or OOP etc it is all down to you really. – Sammaye Sep 14 '12 at 11:17
2

Another point you might want to consider is when you use OGM tools like Hibernate OGM.

If you are using Java, Hibernate OGM supports the JPA standard. So if you can write a JPQL query, you would be theoretically easy if you want to switch to an alternate NoSQL datastore which is supported by the OGM tool.

JPA does not define a equivalent for $exists in Mongo. So if you have optional attributes in your collection then you cannot write a proper JPQL for the same. In such a case, if the attribute's value is stored as NULL, then it is still possible to write a valid JPQL query like below.

SELECT p FROM pppoe p where p.logout IS null;
Vinod
  • 81
  • 9
  • Querying for null in MongoDB includes missing values. [From official doc](https://docs.mongodb.com/manual/tutorial/query-for-null-fields/): "The `{ item : null }` query matches documents that either contain the item field whose value is null *or that do not contain the item field*." – Zaid Masud Apr 18 '19 at 14:43
  • MongoDB indeed supports querying for nulls, but the point was about functional equivalence with JPQL. If it was a missing attribute, it would not be possible to write an equivalent JPQL. – Vinod Apr 24 '19 at 11:08
  • I understand, I am just saying I think that `SELECT p FROM pppoe p where p.logout IS null;` will include *missing attributes* in the result. – Zaid Masud Apr 25 '19 at 12:58
0

I think in terms of disk space the difference is negligible. If you need to create an index on this field then consider Partial Index.

In index with { partialFilterExpression: { key: { $exists: true } } } can be much smaller than a normal index.

Also should be noted, that queries look different, see values like this:

db.collection.insertMany([
  { _id: 1, a: 1 }, 
  { _id: 2, a: '' }, 
  { _id: 3, a: undefined }, 
  { _id: 4, a: null }, 
  { _id: 5 }
])
db.collection.aggregate([
   {
      $set: {
         type: { $type: "$a" },
         ifNull: { $ifNull: ["$a", true] },
         defined: { $ne: ["$a", undefined] },
         existing: { $ne: [{ $type: "$a" }, "missing"] }
      }
   }   
])
   
{ _id: 1, a: 1,         type: double,    ifNull: 1,    defined: true,  existing: true }
{ _id: 2, a: "",        type: string,    ifNull: "",   defined: true,  existing: true }
{ _id: 3, a: undefined, type: undefined, ifNull: true, defined: false, existing: true }
{ _id: 4, a: null,      type: null,      ifNull: true, defined: true,  existing: true }
{ _id: 5,               type: missing,   ifNull: true, defined: false, existing: false }

Or with db.collection.find():

db.collection.find({ a: { $exists: false } })
  { _id: 5 }

db.collection.find({ a: { $exists: true} })
  { _id: 1, a: 1 }, 
  { _id: 2, a: '' }, 
  { _id: 3, a: undefined }, 
  { _id: 4, a: null }

db.collection.find({ a: null })
  { _id: 3, a: undefined }, 
  { _id: 4, a: null },
  { _id: 5 }

db.collection.find({ a: {$ne: null} })
  { _id: 1, a: 1 }, 
  { _id: 2, a: '' }, 

db.collection.find({ a: {$type: "null"} })
  { _id: 4, a: null }
Wernfried Domscheit
  • 54,457
  • 9
  • 76
  • 110