1

My main object looks like this:

{
"_id" : ObjectId("56eb0a06560fd7047318465a"),
 ...
"intervalAbsenceDates" : [
    DBRef("ScheduleIntervalContainer", ObjectId("56eb0a06560fd7047318463b")),
    DBRef("ScheduleIntervalContainer", ObjectId("56eb0a05560fd70473184467")),
    DBRef("ScheduleIntervalContainer", ObjectId("56eb0a05560fd70473184468")),
    DBRef("ScheduleIntervalContainer", ObjectId("56eb0a05560fd70473184469")),

An embedded ScheduleIntervalContainer object looks like this:

{
"_id" : ObjectId("56eb0a06560fd7047318463b"),
"end" : ISODate("2022-08-23T07:06:00Z"),
"available" : true,
"confirmation" : true,
"start" : ISODate("2022-08-19T09:33:00Z")
}

Now I will query all ScheduleIntervalContainers where start and end is in a range. I have tried a lot but I not even can query one ScheduleIntervalContainer by id. This is my approach:

db.InstitutionUserConnection.find( { 
"intervalAbsenceDates" : { 
    "$ref" : "ScheduleIntervalContainer", 
    "$id" : ObjectId("56eb0a05560fd7047318446d")
}
})

Could anyone give me a hint how to query all ScheduleIntervalContainers which have start and end in a time range.

Blakes Seven
  • 49,422
  • 14
  • 129
  • 135
quma
  • 5,233
  • 26
  • 80
  • 146
  • What do you mean by *"within a time range"*? Are you trying to inspect a `Date` field on the referenced document or are you trying to do this based on the timestamp portion of the linked `ObjectId`? Moreover, would that latter be enough? MongoDB does not *really* do joins, so there isn't a way to "see" the referenced document properties in a query. There are modern ways with different refrence storage than `DBRef`. Also to mention that `DBRef` is not a really great idea. – Blakes Seven Mar 17 '16 at 23:47
  • Just an FYI. For single selection, the reason why your example query does not work is because it does not match the BSON type for `DBRef`. Values like `$ref` are just the serialized form of what the object looks like, and not actually hard fields available for query. Instead you would use the BSON type for `DBRef` directly: `dbInstitutionUserConnection.find({ "intervalAbsenceDates": DBRef("ScheduleIntervalContainer", ObjectId("56eb0a05560fd70473184469")) })` – Blakes Seven Mar 17 '16 at 23:50

1 Answers1

5

Using DBRef is fraught with problems and the usage is somewhat "antiquated" as it was more of a "shoehorn" solution to requests to provide a mechanism for providing references to external collection data than a really well thought solution.

The general recommendation is to not use DBRef and rather simply use a plain ObjectId or other unique identifier and resolve the actual "collection" or even "database container" with other information, being either another standard property stored in the document or just a plain external reference in your code which identifies the target collection.

The Basic BSON Problem

A good reason for this is that people often make the common mistake ( just like you have ) of interpretting the "serialized" output from a stored object containing a DBRef to actually be "properties" present on the object. This is not true, as in fact the object has it's own BSON type, just like Date and ObjectId. The only time that serialized form is valid is when used with a "strict mode" JSON parser, which would yeild the actual DBRef object.

The manual itself is not particularly helpful with that fact and says "DBRefs have the following fields". This is misleading since those properties are not actually available as fields for query, and would not be exposed other than object inspection available to JavaScript processing of $where or mapReduce. And not a good approach to use either of those for "query" purposes.

So those properties are not available for query, but you can of course specifiy the BSON form directly from your API. Such as:

db.InstitutionUserConnection.find({ 
    "intervalAbsenceDates": DBRef(
        "ScheduleIntervalContainer",  
        ObjectId("56eb0a05560fd70473184469")
    ) 
})

Which resolves and matches correctly since the correct BSON form was sent in a query and the element will actually match.

From this principle we can then move on to the the concept of "ranges" as asked in the question.

MongoDB does not "really" do Joins

Until recent releases ( MongoDB 3.2.x series ) the statement was actually "MongoDB does not do joins", and this has been a general distinction in design philosophy away from relational databases.

The general mantra "has" been that "joins are costly" and therefore do not scale well in distributed data systems such as what MongoDB is principally designed for.

So if you are asking for referencing a property in the document which the DBRef refers to, then you are basically out of luck. The only possible action for filtering results based on an external property in this case would be:

  1. Look at all the data in the master collection, either loaded to query in whole or processed individually

  2. For each retrieved document, lookup and expand the DBRef values to their target collection data.

  3. Filter out documents whose expanded data from external references do not actually meet the conditions.

This means all the "expansion" and "filtering" must take place on the "client" to the database rather than on the server itself. There simply is no mechanism to do this, so you end up pulling a lot of data over the network connection.

With a DBRef in place this is purely not possible to perform on the server even with modern releases. Again it's the same BSON type problem, since the "source" contains DBRef types and the "target" contains ObjectId types.

If however you can simply live with the fact that your "range" is looking at the "creation date" data that would be inherently present in any ObjectId, then there is of course another approach that does not involve a "join".

Filtering on ObjectId "range"

Every ObjectId starts with 4-bytes that represents the current timestamp value ( excluding milliseconds ) at the time the ObjectId was created. This is generally a good indicator of the time of "insertion" for the document in question where it was used.

From this you can determine that the "created date" of the documents refernced with DBRef is approximately equal to that portion of the ObjectId value used by that document in the target collection. This allows you to basically construct a "range" value for ObjectId values that would fall between the given range. As a consequence, you can then construct DBRef BSON objects that would work with range operators:

// Define start and end dates to query
var dateStart = new Date("2016-03-17T19:48:21Z"), // equal to 56eb0a05
    dateEnd = new Date("2016-03-17T19:48:25Z");   // equal to 56eb0a09

// Convert to hex and pad to ObjectId length
var startRange = new ObjectId(
    ( dateStart.valueOf() / 1000 ).toString(16) + "0000000000000000"
),
// Yields ObjectId("56eb0a050000000000000000")
endRange = new ObjectId(
    ( dateEnd.valueOf() / 1000 ).toString(16) + "ffffffffffffffff"
);
// Yields ObjectId("56eb0a09ffffffffffffffff")

// Now query with contructed DBRef values

db.InstitutionUserConnection.find({ 
    "intervalAbsenceDates": {
        "$elemMatch": {
            "$gte": DBRef("ScheduleIntervalContainer",startRange),
            "$lt": DBRef("ScheduleIntervalContainer",endRange),
        }
    }
})

So as long as "created" is what you are looking for, then that method should suffice for selecting the matching parents without first expanding the DBRef values in the array for further inpection.

The Reverse Case Lookup

Of course the other case here is to simply query the "joined" collection first and then look for documents in the "master" collection that contain the ObjectId values within the DBRef. This does of course mean multiple queries to be issued, but it does cure the case of expanding every DBRef just to match the related properties:

// Create array of matching DBRef values
var refs = db.ScheduleIntervalContainer.find({
    "start" { "$lte": targetDate },
    "end": { "$gte": targetDate }
}).map(function(doc) {
    return DBRef("ScheduleIntervalContainer",doc._id)
});

// Find documents that match the DBRef's within the array
db.InstitutionUserConnection.find({ 
    "intervalAbsenceDates": { "$in": refs } 
})

The practicallity of this varies on the number of matches from the related collection resulting in the array that would be passed to $in, but it does actually yield the desired result.

Actually doing Joins

I mention earlier that "modern" MongoDB releases now have an approach to "joining" data from different collections. This is the $lookup aggregation pipeline operator.

But while this can be used to "join" data, the current usage of DBRef does not work here. I did also mention earlier that the basic problem is the data in the array is a DBRef, but the data in the referenced collection is instead an ObjectId.

So if you wanted to use a $lookup approach, then you would first need to use plain ObjectId values in place of the existing DBRef values:

{
  "_id" : ObjectId("56eb0a06560fd7047318465a"),
  "intervalAbsenceDates" : [
    ObjectId("56eb0a06560fd7047318463b"),
    ObjectId("56eb0a05560fd70473184467"),
    ObjectId("56eb0a05560fd70473184468"),
    ObjectId("56eb0a05560fd70473184469")
  ]
}

With data in that structure you could then use $lookup and other aggregation pipeline methods to just return the documents that actually match the value of a related property. I.e "end" within the related object:

 db.InstitutionUserConnection.aggregate([
     // Presently you need to unwind the array first
     { "$unwind": "$intervalAbsenceDates" },

     // Then $lookup to get a resulting array of matches for each member
     { "$lookup": {
         "from": "ScheduleIntervalContainer",
         "localField": "intervalAbsenceDates",
         "foreignField": "_id",
         "as": "absenceDates"
     }},

     // unwind the array result field as well
     { "$unwind": "$absenceDates" },

     // Now reform the documents
     { "$group": { 
         "_id": "$_id",
         "intervalAbsenceDates": { "$push": "$absenceDates" }
     }},

     // Then query on the "end" property for the range
     { "$match": { 
         "intervalAbsenceDates": {
             "$elemMatch": {
                 "end": {
                     "$gte": new Date("2016-03-23"),
                     "$lt": new Date("2016-03-24")
                 }
             }
         }
     }}
 ])

The current behaviour of $lookup is that you cannot directly process on an array property in the document, so the procedure shown in "$lookup on ObjectId's in an array" is used to replace the current array with the expanded objects from the other collection.

Once the operations here actually produce a document that now has that related data embedded, it's a straightforward process of looking at the properties of the documents within the array to see if they match the query conditions.

Conclusion

This all should show that DBRef is not a good idea for storing references. Whist it is possible as demonstrated to work around the problem using the ObjectId values, you generally want to have a plain ObjectId or other key value as the reference and resolve them by other means. And even if the workaround is sufficient, it works just the same with plain ObjectId values or anything else that presents a natural range.

When it comes to using the "values" of refernced properties in such a "join", then of course regardless of using DBRef or other value, it is not a possibilty without $lookup to use that in query conditions on the server. All data would first need to be loaded to the client and then resolved with additional queries to the database before those properties can be inspected for filtering.

Since the mechanism of $lookup will in fact result in a form that will look exactly like if you "embedded" the data in the first place, then "embedding" is most often the correct approach, since the data is already present in the source collection and available for query.

There is a lot of "scare media" around regarding the BSON limit of 16MB and saying this is why you keep data in another collection. Sometimes this does indeed apply, but most of the time it does not. Afterall, 16MB is really quite a large amount of data, and more than most would actually use in general applications.

Citation from MongoDB: The Definitive Guide

To give you an idea of how much 16MB is, the entire text of War and Peace is just 3.14MB.

To examine for queries to need to get to an "embedded form" anyway, and it is arguable that if you can store an array of DBRef or ObjectId or whatever as embedded data, then storing all of the content they are actually pointing to is not really that much more of a stretch.

The general lesson is that you should be designing based on the actual usage patterns your applcation applies to the data. If you are querying "related data" all of the time, then it makes most sense to keep that data all in the one collection. Of course other factors apply, but always keep in mind the trade-off in performance considerations by what you are doing.

Community
  • 1
  • 1
Blakes Seven
  • 49,422
  • 14
  • 129
  • 135