1

I have a MongoDB collection with various data in it. (about millions)

These data have a data struct like {k: {a:1,b:2,c:{},...}} and I don't know extactly what in it.

Now I wanna do a counting on this collection to return me the total elements in the collection that k is not empty by using {k:{$exists:true}} but that's turns out very slow ...

Then I add an index on k and trying to query by : {k:{$gt:{}} but that's not return the correct results.

So, how to do this counting on the collection now?

Note that I don't know the data structure of k.

jackalope
  • 1,554
  • 3
  • 17
  • 37
  • Can you show us an explain of your query with an index and also tell us how many documents you are expecting to work on? Normally a good way is to query on a subfield you always know will exist if the field is not empty. – Sammaye Dec 14 '12 at 12:24
  • What version of mongodb are you using? – Alex Dec 14 '12 at 12:43

3 Answers3

1

If you are using a version before version 2, $exists is not able to use an index. See this answer: https://stackoverflow.com/a/7503114/131809

So, try upgrading your version of MongoDB

From the docs:

Before v2.0, $exists is not able to use an index. Indexes on other fields are still used.
$exists is not very efficient even with an index, and esp. with {$exists:true} since it will effectively have to scan all indexed values.

The second part of that is perhaps the important bit.

It sounds like sparse index may be the key here...

Community
  • 1
  • 1
Alex
  • 37,502
  • 51
  • 204
  • 332
0

Try using $ne : null

So, as per your code example:

{k:{$ne : null}}
Alex
  • 37,502
  • 51
  • 204
  • 332
0
db.collection.count({k:{$ne:null}})

By the way use sparse index on k.

db.collection.ensureIndex({k:1}, {sparse: true});
Sushant Gupta
  • 8,980
  • 5
  • 43
  • 48
  • That will be more inefficient as `null` in his case, if querying by `null` is slow for him then that definitely will be – Sammaye Dec 14 '12 at 12:44
  • That would counting all document even k is null. – jackalope Dec 14 '12 at 12:45
  • No it won't... Look carefully what I wrote. If you want to see how it works behind the scenes... Replace `count` by `find` and do explain() over cursor. This will get you what you want. db.collection.find(...).explain() You would see btree cursor is used and nscanned is also equal to number of k. – Sushant Gupta Dec 14 '12 at 12:46
  • Check out the modified answer. – Sushant Gupta Dec 14 '12 at 12:49
  • 1
    sparse will omit null fields, how does that help? that means that the data he is querying for will not be in the index... – Sammaye Dec 14 '12 at 12:49
  • But for counting the documents, it works just fine. I just created a sample db and did for myself. What I mean is, for queries like these he can surely use a sparse index. – Sushant Gupta Dec 14 '12 at 12:51
  • @jackalope Can k be null and would you want it to be counted as well? – Sushant Gupta Dec 14 '12 at 12:53
  • k can be null and surely I don't wanna null counted. – jackalope Dec 14 '12 at 12:54
  • Ok, got your point. Sorry... Bad answer. Should have thought. – Sushant Gupta Dec 14 '12 at 12:54
  • I suppose, I am unsure since I have not checked this, that a sparse index in this case could diff the collection count and the index count, though the index count is not exact so I would not imagine that to be the case. – Sammaye Dec 14 '12 at 12:54
  • But using sparse index for only for this type of query won't be a problem. Can someone explain me how it is a problem. – Sushant Gupta Dec 14 '12 at 12:56
  • Sparse omits null fields, hence why it is so useful for unique indexes to be sparse so that null fields don't throw a unique exception, it means that the data is not in index when it is queried by `null` or empty which means it is the same as not using an index, should be anyway. – Sammaye Dec 14 '12 at 12:58
  • But won't it be iterating over only those documents where k is defined. Say he has 100K docs in all of which he has 50 docs having k. Won't 50 be only iterated that way if using sparse index. – Sushant Gupta Dec 14 '12 at 12:59
  • But he is looking for where `k` isn't there, that means the other 99,550 will need scanning otherwise the index will give incorrect results, infact I am unsure what using a spare index in this case will do, I should test that, but logic dictates that the index shouldn't do that. – Sammaye Dec 14 '12 at 13:04
  • Is there anyway we could determine how many documents are indexed by an index. That way it would be constant time complexity if using sparse index. – Sushant Gupta Dec 14 '12 at 13:04
  • What he wrote in question suggests that he wants to count docs having `k` – Sushant Gupta Dec 14 '12 at 13:06
  • hehe... this was the most rigorous discussion on SO I had till date. I enjoyed it. Thanks @Sammaye and jackalope. – Sushant Gupta Dec 14 '12 at 13:08