MongoDB $where queries and tailable cursors -- WAS: date math best practices

Question

My problem: give me a list of documents older than X amount of time.

If I have a document created with:

db.dates.insert({date: new Date()});

And now I want to find it only when the "date" has become 30 minutes old:

db.dates.find({ $where: "this.date.getTime() + 30 * 60000 <= new Date()"});

This works, but in the Mongo documentation states quite clearly that there is a significant performance penalty to $where queries.

Thus the question, is there a better way?

==========UPDATE 1==========

I should have added that I am hoping to have this query function "dynamically" be creating the query one time and using it to obtain a tailable cursor on a capped collection... and I am not sure any longer that it is actually possible.

I will test and repost.

==========UPDATE 2==========

So, looks like my "delayed" queue is going to have to be handled in code, either with polling or some "check, then sleep" algorithm, because that appears to be what mongo's delayed replication is doing (from db.cpp):

if ( replSettings.slavedelay && ( unsigned( time( 0 ) ) < nextOpTime.getSecs() + replSettings.slavedelay ) ) {
    assert( justOne );
    oplogReader.putBack( op );
    _sleepAdviceTime = nextOpTime.getSecs() + replSettings.slavedelay + 1;
    dblock lk;
    if ( n > 0 ) {
        syncedTo = last;
        save();
    }
    log() << "repl:   applied " << n << " operations" << endl;
    log() << "repl:   syncedTo: " << syncedTo.toStringLong() << endl;
    log() << "waiting until: " << _sleepAdviceTime << " to continue" << endl;
    return okResultCode;
}

mnemosyn · Answer 1 · 2011-12-05T23:05:41.427

4

The $lte operator (and other range queries) will work and utilize indexes, but it cannot evaluate expressions. You have to query against a query-time constant (which would be 'now - 30 min'):

var threshold = new Date(); 
threshold.setMinutes(-30);

// now, query against a constant:
db.dates.find({"date" : {$lte : threshold}});

Of course, you can do the same with any driver for any programming language, e.g. C#

var c = db.Collection.Find(Query.LTE("date", DateTime.UtcNow.AddMinutes(-30));

edited Dec 05 '11 at 23:05

answered Dec 05 '11 at 19:15

mnemosyn

45,391
6
76
82

So, my (self imposed) "problem" is that I want to attach this query to a capped collection and use a tailable cursor to obtain the newest matching results. Come to think of it, I was writing this code last night and have not yet unit tested--I will do that now to find out if even my evaluated solution is workable in this use case. – Justin Coffey Dec 06 '11 at 09:41
Interesting. I don't know if tailable cursors are made for that, because the idea of tail is to immediately retrieve new items matching a given criterion, or skip (like an insert 'trigger'). If your criterion is essentially "x is *not* the most recently inserted item (because it's old)", how would tailable cursors be more efficient? On the other hand, the doc says that the op log replication is implemented using tailable cursors, and the op log supports a delay, so it should be possible somehow. – mnemosyn Dec 06 '11 at 10:12

score 0 · Answer 2 · edited May 23 '17 at 11:52

0

A similar question was answered at Query Mongodb on month, day, year... of a datetime

The idea is to query for the end and start ranges of the needed time. Your query can be easily translated into what you need. Just use new Date() on one side and this.date.getTime() + 30 * 60000 on the other.

edited May 23 '17 at 11:52

Community

1
1

answered Dec 05 '11 at 17:29

DrColossos

12,656
3
46
67

So, not sure if this is what you're suggesting, but this does not appear to work: `db.dates.find({ "this.date.getTime()+30*60000": {$lte: "new Date()"} })` – Justin Coffey Dec 05 '11 at 17:46

score 0 · Answer 3 · answered Dec 06 '11 at 15:18

0

So, I should have tested at the beginning! My original query works as expected, even in the case of tailable cursors.

As it turns out Mongo appears to re-evaluate the $where javascript at each insertion of a record into the collection.

I suppose this actually makes complete sense given the fact that you can reference the "this" object--if the javascript was not re-evaled each time then "this" would actually mean "first".

The efficiency of this remains to be seen, especially as more and more tailable cursors get added to the collection.

answered Dec 06 '11 at 15:18

Justin Coffey

51
5

Why use tailable cursors to squeeze out the last 10%, only to use the comparetively expensive js engine and execute `$where`? Wouldn't it be much easier to use a simple indexed field and a vanilla query? – mnemosyn Dec 06 '11 at 15:29
pigheaded-ness? More seriously, the idea is to have as low latency as possible (ie to mimic a message queue) and excessive polling can drive up load and latency... I'm still testing actually and should have a final solution tomorrow – Justin Coffey Dec 06 '11 at 17:21
See your point. I'm interested in your results, plz keep me updated! :) – mnemosyn Dec 06 '11 at 18:59
1

So, a colleague pointed out a problem with this. Can an item "age" and eventually become avaliable? With a tailable cursor a record appears to be evaulated against a query only ONCE. If it does not meet the criteria it is ignored permanently for this cursor. So, this method does not pass all tests and I have to go back to the oplog sleep solution (I will post that in another answer). – Justin Coffey Dec 07 '11 at 10:02

score 0 · Accepted Answer · answered Dec 07 '11 at 10:12

So my earlier answer to my question did not pass all unit tests. It passed all but one:

Inserting a record that will eventually be available while a tailable cursor has been attached to the collection.

The test setup is as follows:

In a separate thread attach a tailable cursor to a collection.
In the main thread insert a document into the collection that should be available in X amount of time.
Sleep the main thread for X amount of time.
See if the thread with the tailable cursor has the document.

The only way I was able to get this to work was by mimicking part of the delayed oplog logic:

if (cursor.hasNext()) {
    DBObject obj = cursor.next();

    if (config.getQueueDelay() > 0) {
        ObjectId objId = (ObjectId) obj.get("_id");
        long advisedSleep = (objId.getTime() + config.getQueueDelay() * 60000)
        - System.currentTimeMillis();
        if (advisedSleep > 0 ) {
            LOG.debug(
                "object is not yet old enough, sleeping for: " 
                 + advisedSleep + "ms"
            );
            Thread.sleep(advisedSleep);
        }
    }
    return obj;
}

This is an okay algorithm in my opinion because so long as the collection in question relies on auto-created ObjectIds we can be 100% certain to have objects in the proper order and if we can't be certain of this, then we can always add an order by to the initial query.

On the other hand, the collection must have an ObjectId (my code has it assumed to be in the _id field of the object, but this could be easily changed to a configurable value).

MongoDB $where queries and tailable cursors -- WAS: date math best practices

4 Answers4