So what I think is that there may be some mongodb command to get documents by index keys based on their position in the index. The point is that I can create covered compound index like this:
No there isn't such a function in MongoDB, though it is a good idea to be able to randomise a result set. In the meantime here is a JIRA: https://jira.mongodb.org/browse/SERVER-533
Since there is no way to pick from a position of index so that it can use an index and consequently a single round trip you have no choice but you open multiple cursors.
The current solution depends on how many documents are in your result set.
If there are a small number of document in your result set you can solve this with an easy skip(rand())
and limit(1)
however you must be aware that both skip()
and limit()
make no effective use of indexes.
That does not mean it will scan the entire Btree it means it will scan as far as you skip()
.
This means that if your result set grows large and the rand()
becomes a large number you will see serious performance problems, as many have.
One good way of possibly solving this is to maintain either:
And use that new field to "skip" on using the rest of your query like:
var arr = db.articles.find({topic: 3, rand: rand()}, {_id:1}).limit(7).toArray();
Will get 7 random rows using the 0
to 1
idea.
The random sort abilities of this rely on an ever changing data set to help create a randomness within the sort. This will, of course, not work if the result set is continously static.
As for using batchSize, it becomes irrelavant here and infact normally does. For example your logic of using BatchSize to get all your results back does not make complete sense since the BatchSize normally has a absolute max size of 16MB. This means that if your documents are large you may not be getting the single round trip you think you are.
This also only dictates that the server will send all this data at once, it does not denote the amount of work placed on the server, just the amount of data sent over the wire at once.
So considering that you must do this with mutliple cursors (the way I would recommend) you can just run:
var arr = db.articles.find({topic: 3, rand: {$gte:rand()}}).sort({rand:1}).limit(1);
Several, or however many you need, times over. This is not much different to a normal iteration of a cursor and provided you have the right indexes should be quite fast.
There is one other method, but I do not recommend it. You can run an MR, say, once every hour or something which creates another collection of _id
and rand()
this means you can do the first query I showed:
var arr = db.articles.find({topic: 3, rand: rand()}, {_id:1}).limit(7).toArray();
And really get 7 random records, since the rand()
will, of course, be different. But this is not real time and nor would it be very nice for your server on a large data set as such I do not recommend such a thing.
Edit
There is one other way. With the auto incrementing id you could do an $or
statement to pick out 7 rand()
s all at once. However this introduces yet another problem, deleting.
If you delete any rows you might hit a rand()
that doesn't exist and so no row will be returned. Since the auto incrementing id is not maintained to counter deletes server-side you would have to do this yourself. This would not be an easy or scalable thing to do.
To add to this $or
statements cannot be limit()
ed on clause which means you can't get around this by doing sub select type $or
s to make MongoDB only pick out one result per $or
clause using $gte
.
The same applies for a rand()
between 0
and 1
. This would work with an $or
if you could limit clauses.