4

In MongoDB, a field can have multiple values (an array of values). Each of them is indexed, so you can filter on any of the values. But can you also "order by" a field with multiple values and what is the result?

Update:

> db.test.find().sort({a:1})
{ "_id" : ObjectId("4f27e36b5eaa9ebfda3c1c53"), "a" : [ 0 ] }
{ "_id" : ObjectId("4f27e3845eaa9ebfda3c1c54"), "a" : [ 0, 1 ] }
{ "_id" : ObjectId("4f27df6e5eaa9ebfda3c1c4c"), "a" : [ 1, 1, 1 ] }
{ "_id" : ObjectId("4f27df735eaa9ebfda3c1c4d"), "a" : [ 1, 1, 2 ] }
{ "_id" : ObjectId("4f27df795eaa9ebfda3c1c4e"), "a" : [ 2, 1, 2 ] }
{ "_id" : ObjectId("4f27df7f5eaa9ebfda3c1c4f"), "a" : [ 2, 2, 1 ] }
{ "_id" : ObjectId("4f27df845eaa9ebfda3c1c50"), "a" : [ 2, 1 ] }
{ "_id" : ObjectId("4f27e39a5eaa9ebfda3c1c55"), "a" : [ 2 ] }

With unequal length arrays the longer array is "lower" than the shorter array

So, why is [0] before [0,1], but [2] after [2,1] ? Is maybe sorting only done on the first array element? Or the lowest one? And after that it is insertion order?

Also, how is this implemented in the case of an index scan (as opposed to a table scan)?

Thilo
  • 257,207
  • 101
  • 511
  • 656

1 Answers1

7

Sorting of array elements is pretty complicated. Since array elements are indexed seperately sorting on an array field will actually result in some interesting situations. What happens is that MongoDB will sort them based on the lowest or highest value in the array (depending on sort direction). Beyond that the order is natural.

This leads to things like :

> db.test.save({a:[1]})
> db.test.save({a:[0,2]})
> db.test.find().sort({a:1})
{ "_id" : ObjectId("4f29026f5b6b8b5fa49df1c3"), "a" : [ 0, 2 ] }
{ "_id" : ObjectId("4f2902695b6b8b5fa49df1c2"), "a" : [ 1 ] }
> db.test.find().sort({a:-1})
{ "_id" : ObjectId("4f29026f5b6b8b5fa49df1c3"), "a" : [ 0, 2 ] }
{ "_id" : ObjectId("4f2902695b6b8b5fa49df1c2"), "a" : [ 1 ] }

In other words. The same order for reversed sorts. This is due to the fact that the "a" field of the top document holds both the lowest and the highest value.

So effectively for the sort MongoDB ignores all values in the array that are not either the highest ({field:-1} sort) or the lowest ({field:1} sort) and orders the remaining values.

To paint an (oversimplified) picture it works something like this :

flattened b-tree for index {a:1} given above sample docs :

"a" value 0 -> document 4f29026f5b6b8b5fa49df1c3
"a" value 1 -> document 4f2902695b6b8b5fa49df1c2
"a" value 2 -> document 4f29026f5b6b8b5fa49df1c3

As you can see scanning from both top to bottom and bottom to top will result in the same order.

Empty arrays are the "lowest" possible array value and thus will appear at the top and bottom of the above queries respectively.

Indexes do not change the behaviour of sorting on arrays.

Remon van Vliet
  • 18,365
  • 3
  • 52
  • 57
  • Is that still the case when an index is used? – Thilo Jan 31 '12 at 12:31
  • Yes, this is one of those instances where indexes do not change the outcome of queries. Sadly, this is not always the case in MongoDB (which frankly I consider somewhat of a bug). – Remon van Vliet Jan 31 '12 at 12:36
  • Just checked. Creating an index does not change the sort order (which of course is good). I wonder how this is implemented. – Thilo Jan 31 '12 at 12:37
  • Do you have an example for when the presence of an index leads to a buggy outcome? That the order can change when no sort order is specified is not a bug, all DBMS do that. Also happens when elements get added or updated, for example. – Thilo Jan 31 '12 at 12:39
  • 1
    No I was talking more generally. There are some queries that give different results. For example, using sparse indexes makes count() queries with $exists:false clauses return 0 even if this is not factually accurate. – Remon van Vliet Jan 31 '12 at 12:49
  • "There are some queries that give different results": That would be a real bug. Mongo should refuse to use the index in that case. – Thilo Jan 31 '12 at 12:52
  • 1
    Actually this is a more relevant issue : https://jira.mongodb.org/browse/SERVER-3918 – Remon van Vliet Jan 31 '12 at 15:03
  • Please check my updated answer ;) Things are slightly different than I said. – Remon van Vliet Feb 01 '12 at 09:12
  • 1
    Thanks, that's what I thought. Accepting again. And this needs to be documented on the Mongo site. – Thilo Feb 01 '12 at 09:20
  • It most definitely should. I'll ask of someone at 10gen can add it. – Remon van Vliet Feb 01 '12 at 09:27