2

I am trying to retrieve 100000 docouments from MongoDb like below and its taking very long to return collection.

var query = Query.EQ("Status", "E");
var items = collection.Find(query).SetLimit(100000).ToList();

Or

var query = Query.GT("_id", idValue);
var items = collection.Find(query).SetLimit(100000).ToList();

Explain:

{
    "cursor" : "BtreeCursor _id_", 
    "nscanned" : 1, 
    "nscannedObjects" :1, 
    "n" : 1, 
    "millis" : 0, 
    "nYields" : 0, 
    "nChunkSkips" : 0,
    "isMultiKey" : false, 
    "indexOnly" : false, 
    "indexBounds" : 
        {
            "_id" :[[ObjectId("4f79a64eca98b5fc0e5ae35a"),
                ObjectId("4f79a64eca98b5fc0e5ae35a")]]
        } 
}

Any suggestions to improve query performance. My table having 2 million documents.

-Venkat

i3arnon
  • 113,022
  • 33
  • 324
  • 344
Venkat
  • 868
  • 1
  • 12
  • 21
  • 3
    Please try and make your questions a bit more informative. What is "a long time"? What execution time would you expect? What are you indexes (if any)? Etc. The more information you give the more people can give helpful answers. Since you're saying the "_id" query is just as slow I'm going to assume it's not an index issue since _id is always indexed. – Remon van Vliet Apr 03 '12 at 13:58
  • _id is ObjectId index column, we are not using MapReduce. Its taking 3 minutes to execute above statement. Our problem is simple, we having 2 million documents in one collection we want to retrieve 100000 documents as fast as possible – Venkat Apr 04 '12 at 07:23

5 Answers5

3

This question was also asked on Google Groups:

https://groups.google.com/forum/?fromgroups#!topicsearchin/mongodb-user/100000/mongodb-user/a6FHFp5aOnA

As I responded on the Google Groups question I tried to reproduce this and was unable to observe any slowness. I was able to read 100,000 documents in 2-3 seconds, depending on whether the documents were near the beginning or near the end of the collection (because I didn't create an index).

My answer to the Google groups question has more details and a link to the test program I used to try and reproduce this.

Robert Stam
  • 12,039
  • 2
  • 39
  • 36
1

Given the information you have provided my best guess is that your document size is too large and the delay is not necessarily on the mongo server but on the transmission of the result set back to your app machine. Take a look at your avg document size in the collection, do you have large embedded arrays for example?

Compare the response time when selecting only one field using the .SetFields method (see example here How to retrieve a subset of fields using the C# MongoDB driver?). If the response time is significantly faster then you know that this is the issue.

Community
  • 1
  • 1
Zaid Masud
  • 13,225
  • 9
  • 67
  • 88
1

If you don't create indexes for your collection the MongoDB will do a full table scan - this is the slowest possible method.

You can run explain() for your query. Explain will tell you which indexes (if any) are used for the query, number of scanned documents and total query duration.

If your query hits all the indexes and it's execution is still slow then you probably have a problem with the size of the collection / RAM.

MongoDB is the fastest when collection data + indexes fits in the memory. If the your collection size is larger than available RAM the performance drop is very large.

You can check the size of your collection with totalSize(), totalIndexSize() or validate() (these are shell commands).

Christian P
  • 12,032
  • 6
  • 60
  • 71
0

Have you defined indices?

http://www.mongodb.org/display/DOCS/Indexes

user590028
  • 11,364
  • 3
  • 40
  • 57
0

There are several things to check:

  1. Is your query correctly indexed?
  2. If your query is indexed, what are the odds that the data itself is in memory? If you have 20GB of data and 4GB of RAM, then most of your data is not in memory which means that your disks are doing a lot of work.
  3. How much data does 100k documents represent? If your documents are really big they could be sucking up all of the available disk IO or possibly the network? Do you have enough space to store this in RAM on the client?

You can check for disk usage using iostat (a common linux tool) or perfmon (under Windows). If you run these while your query is running, you should get some idea about what's happening with your disks.

Otherwise, you will have to do some reasoning about how much data is moving around here. In general, queries that return 100k objects are not intended to be really fast (not in MongoDB or in SQL). That's more data than humans typically consume in one screen, so you may want to make smaller batches and read 10k objects 10 times instead of 100k objects once.

Gates VP
  • 44,957
  • 11
  • 105
  • 108
  • As per requirement we need to get all records with status and then do some calculation and updating back to mongodb. Collection holding 2 million documents. I am using mongoVue tools, it can able retrieve 100000 documents very fast enough. – Venkat Apr 03 '12 at 10:08
  • It sounds like you need to be running a Map/Reduce. What is "very slow"? Is it taking seconds? minutes? hours? – Gates VP Apr 03 '12 at 18:41
  • _id ObjectId index column, we are not using MapReduce. Its taking 3 minutes to execute above statement. Our problem is simple, we having 2 million documents in one collection we want to retrieve 100000 documents as fast as possible. – Venkat Apr 04 '12 at 07:18
  • OK, so three minutes is typically too long. Did you check disk IO? Did you check memory? How big are the documents? You are using `ToList()`, that means all of the documents have to live in memory on the client. Does the client have resources for this? – Gates VP Apr 04 '12 at 08:07
  • Client having enough memory, document having only 5 properties(columns) and having hold simple strings and decimal values. We need to have full collection in memory to play around with records. Using MongoVUE admin tool 100000 documents able to retrieve in 00:00:01.703 seconds. – Venkat Apr 04 '12 at 09:19
  • @Venkat did you try execute the query from the console using explain to see detailed report? Is the query from console also slow? – Christian P Apr 04 '12 at 09:33
  • Explain - giving us: { "cursor" : "BtreeCursor _id_", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "_id" : [[ObjectId("4f79a64eca98b5fc0e5ae35a"), ObjectId("4f79a64eca98b5fc0e5ae35a")]] } } – Venkat Apr 05 '12 at 07:34
  • @Venkat In explain "milis" : 1 shows that the query executed in 0 milliseconds and "nscanned" shows that only one item was examined. This query does not returns 100 000 objects and is executed immediately. Is this the right query? – Christian P Apr 05 '12 at 11:31