6

I have a collection like this:

{
"_id" : ObjectId("51f4ad560364f5490ccebe26"),
"fiTpcs" : [
    "uuid1",
    "uuid2",
    "uuid3",
    "uuid4",
    "uuid5"
],
"fiTpcsCnt" : 5
}

The list of fiTpcs is long and can go to hundreds later. When I retrieve my collection, I want to get a limited list of fiTpcs, say 20 at a time and fire separate requests to get subsequent data from fiTpcs. I just want to ensure that the queries don't get slow later when I have a lot more data. Is there a way to do it in mongodb? until now, I have been doing

db.userext.find({"_id" : ObjectId("51f4ad560364f5490ccebe26")}).pretty();

which always gets me the full fiTpcs array. I am using java driver with Spring and a solution using Spring/java would also be fine. Please note - if the solution requires mongo to scan through the whole fiTpcs array and then slice a part of it, it doesn't really add any performance benefits, that is not what I am looking for.

Jayz
  • 1,174
  • 2
  • 19
  • 43
  • Look for Skip and Limit in your Java driver, and if it's not using an index, you'll need to create one to make it reasonably efficient. – WiredPrairie Aug 10 '13 at 01:48
  • 1
    But, if you have all of the data in a huge array, you should look at `$slice`. – WiredPrairie Aug 10 '13 at 01:52
  • Skip and Limit is not good in terms of performance. check http://stackoverflow.com/questions/5049992/mongodb-paging and http://stackoverflow.com/questions/7228169/slow-pagination-over-tons-of-records-in-mongo. I can use range-based paging but how to do it for subdocuments? – Jayz Aug 10 '13 at 04:09
  • Just saw a suggestion on another [question](http://stackoverflow.com/questions/16752917/subdocuments-pagination-in-mongoose) that maybe using the aggregation framework with $unwind on your sub documents would work. – Alistair Nelson Aug 10 '13 at 11:00
  • I have tested with aggregation framework with $unwind, unfortunately, it is not great wrt performance either :( – Jayz Aug 10 '13 at 12:06
  • 1
    What sort of performance are you looking for? Do you have tests to show that one approach is substantially better than another? In many cases you'll find that what appears to be low performance before it's implemented is actually perfectly acceptable in a production-like environment. – Trisha Aug 11 '13 at 13:10
  • @Trisha, I stand corrected. Aggregate queries and $slice do perform good enough for prod-like environment (although it degrades with huge data). I am trying to squeeze the maximum I can, also trying to make sure my documents (in rare scenarios) don't grow in size beyond the 16mb limit. – Jayz Aug 11 '13 at 18:01

2 Answers2

10

I may not understand your question in full depth, but seems like $slice is the droid your are looking for:

> db.page.find()
{ "_id" : ObjectId("51f4ad560364f5490ccebe26"), "fiTpcs" : [ "uuid1", "uuid2", "uuid3", "uuid4", "uuid5" ], "fiTpcsCnt" : 2 }
> db.page.find({}, {"fiTpcs" : {$slice : 3}})
{ "_id" : ObjectId("51f4ad560364f5490ccebe26"), "fiTpcs" : [ "uuid1", "uuid2", "uuid3" ], "fiTpcsCnt" : 2 }
> db.page.find({}, {"fiTpcs" : {$slice : [1,3]}})
{ "_id" : ObjectId("51f4ad560364f5490ccebe26"), "fiTpcs" : [ "uuid2", "uuid3", "uuid4" ], "fiTpcsCnt" : 2 }
madhead
  • 31,729
  • 16
  • 153
  • 201
0

After a couple of days of thinking/trying various options, this is what I did finally. I modified my document like this:

{
  "_id" : ObjectId("51f4ad560364f5490ccebe26"),
  "page" : 1,  //1 is the default
  "slug" : "some-unique-string-identifier"
  "fiTpcs" : [
    "uuid1",   //these could be long text, like a long comment/essay
    "uuid2",
    "uuid3",
    "uuid4",
    "uuid5"
  ],
  "fiTpcsCnt" : 5
}

I keep a "pageCount" and "totalFiTpcsCnt" in memcached. I have set MAX_FITPCSCNT = 500 (500 for now, experimental). When I create a new document of type userext, I set the page value to 1.

If I have to push a new object to fiTpcs array:

1) check if "totalFiTpcsCnt" is a multiple of 500. If yes, create a new document of type userext with the same slug, fiTpcsCnt as 0 and fiTpcs array as null. 2) update the last userext - query by slug and "pageCount", push to fiTpcs. Evict cache for "pageCount" and "totalFiTpcsCnt".

Whenever I need my userext document, I always take just the first page. This way I'll never need to query for more than 500 objects of type fiTpcs at a time and I will still have totalFiTpcsCnt always updated in memcached.

Jayz
  • 1,174
  • 2
  • 19
  • 43
  • I'll keep this question open for a few days to see if anyone can suggest a better solution. – Jayz Aug 11 '13 at 17:43
  • 2
    This is not what you've asked. There is no _pagination_ of `fiTpcs` in it, just splitting the documents. – madhead Aug 12 '13 at 12:20