45

I need to create a new field sid on each document in a collection of about 500K documents. Each sid is unique and based on that record's existing roundedDate and stream fields.

I'm doing so with the following code:

var cursor = db.getCollection('snapshots').find();
var iterated = 0;
var updated = 0;

while (cursor.hasNext()) {
    var doc = cursor.next();

    if (doc.stream && doc.roundedDate && !doc.sid) {
        db.getCollection('snapshots').update({ "_id": doc['_id'] }, {
            $set: {
                sid: doc.stream.valueOf() + '-' + doc.roundedDate,
            }
        });

        updated++;
    }

    iterated++;
}; 

print('total ' + cursor.count() + ' iterated through ' + iterated + ' updated ' + updated);

It works well at first, but after a few hours and about 100K records it errors out with:

Error: getMore command failed: {
    "ok" : 0,
    "errmsg": "Cursor not found, cursor id: ###",
    "code": 43,
}: ...

mongo error

General Grievance
  • 4,555
  • 31
  • 31
  • 45
Chava Sobreyra
  • 841
  • 1
  • 9
  • 20
  • 5
    I would have thought your biggest problem was this line `doc.stream && doc.roundedDate && !doc.sid`. You are just passing `.find()` with no query expression at all, and instead you are filtering **all** documents in code. Let the **database** do the work by moving to the query `.find({ "roundedDate": { "$exists": true }, "stream": { "$exists": true }, "sid": { "$exists": false } })`. You should also be using [`.bulkWrite()`](https://docs.mongodb.com/manual/reference/method/db.collection.bulkWrite/) every 500 or so items, rather than actually executing each statement on the server. – Neil Lunn May 30 '17 at 04:11
  • 2
    Those two simple things 1. Don't iterate unnecessary items, 2. use bulk commits to avoid server acknowledgement overhead. Should save "hours" off your current process. You can make other alterations to keep the cursor alive, but the best thing to do is basically reduce the time to processing anyway. Maybe even break up the selection of documents by "range" on the `_id` to reduce the possible selection even further. – Neil Lunn May 30 '17 at 04:14
  • Great information @NeilLunn. I was looking at bulkWrite() -- the problem in my particular case with using bulkWrite() and updateMany() is that I wouldn't have access to each record, and would be unable to create the sid field. I was looking for an approach like this, though. I'm able to get all the records assembled/updated locally in an array using mongoose, but I couldn't find a way to batcg insert/update/overwrite them in the DB. – Chava Sobreyra May 30 '17 at 05:34
  • 1
    I think you misunderstand the usage of `.bulkWrite()`. Take a look at [this answer](https://stackoverflow.com/a/37280419/2313887) as an example. Messing with cursor timeouts will only take you so far. What you really need to do is reduce the overheads. – Neil Lunn May 30 '17 at 05:43
  • @NeilLunn Thanks for pointing that out, I completely missed that and it is actually possible that the query can be executed before the cursor expires, specially if there are indexes available to match the `.find(...)` query. – Danziger May 31 '17 at 23:45

6 Answers6

119

EDIT - Query performance:

As @NeilLunn pointed out in his comments, you should not be filtering the documents manually, but use .find(...) for that instead:

db.snapshots.find({
    roundedDate: { $exists: true },
    stream: { $exists: true },
    sid: { $exists: false }
})

Also, using .bulkWrite(), available as from MongoDB 3.2, will be far way more performant than doing individual updates.

It is possible that, with that, you are able to execute your query within the 10 minutes lifetime of the cursor. If it still takes more than that, you cursor will expire and you will have the same problem anyway, which is explained below:

What is going on here:

Error: getMore command failed may be due to a cursor timeout, which is related with two cursor attributes:

  • Timeout limit, which is 10 minutes by default. From the docs:

    By default, the server will automatically close the cursor after 10 minutes of inactivity, or if client has exhausted the cursor.

  • Batch size, which is 101 documents or 16 MB for the first batch, and 16 MB, regardless of the number of documents, for subsequent batches (as of MongoDB 3.4). From the docs:

    find() and aggregate() operations have an initial batch size of 101 documents by default. Subsequent getMore operations issued against the resulting cursor have no default batch size, so they are limited only by the 16 megabyte message size.

Probably you are consuming those initial 101 documents and then getting a 16 MB batch, which is the maximum, with a lot more documents. As it is taking more than 10 minutes to process them, the cursor on the server times out and, by the time you are done processing the documents in the second batch and request a new one, the cursor is already closed:

As you iterate through the cursor and reach the end of the returned batch, if there are more results, cursor.next() will perform a getMore operation to retrieve the next batch.


Possible solutions:

I see 5 possible ways to solve this, 3 good ones, with their pros and cons, and 2 bad one:

  1. Reducing the batch size to keep the cursor alive.

  2. Remove the timeout from the cursor.

  3. Retry when the cursor expires.

  4. Query the results in batches manually.

  5. Get all the documents before the cursor expires.

Note they are not numbered following any specific criteria. Read through them and decide which one works best for your particular case.


1. Reducing the batch size to keep the cursor alive

One way to solve that is use cursor.bacthSize to set the batch size on the cursor returned by your find query to match those that you can process within those 10 minutes:

const cursor = db.collection.find()
    .batchSize(NUMBER_OF_DOCUMENTS_IN_BATCH);

However, keep in mind that setting a very conservative (small) batch size will probably work, but will also be slower, as now you need to access the server more times.

On the other hand, setting it to a value too close to the number of documents you can process in 10 minutes means that it is possible that if some iterations take a bit longer to process for any reason (other processes may be consuming more resources), the cursor will expire anyway and you will get the same error again.


2. Remove the timeout from the cursor

Another option is to use cursor.noCursorTimeout to prevent the cursor from timing out:

const cursor = db.collection.find().noCursorTimeout();

This is considered a bad practice as you would need to close the cursor manually or exhaust all its results so that it is automatically closed:

After setting the noCursorTimeout option, you must either close the cursor manually with cursor.close() or by exhausting the cursor’s results.

As you want to process all the documents in the cursor, you wouldn't need to close it manually, but it is still possible that something else goes wrong in your code and an error is thrown before you are done, thus leaving the cursor opened.

If you still want to use this approach, use a try-catch to make sure you close the cursor if anything goes wrong before you consume all its documents.

Note I don't consider this a bad solution (therefore the ), as even thought it is considered a bad practice...:

  • It is a feature supported by the driver. If it was so bad, as there are alternatives ways to get around timeout issues, as explained in the other solutions, this won't be supported.

  • There are ways to use it safely, it's just a matter of being extra cautious with it.

  • I assume you are not running this kind of queries regularly, so the chances that you start leaving open cursors everywhere is low. If this is not the case, and you really need to deal with these situations all the time, then it does make sense not to use noCursorTimeout.


3. Retry when the cursor expires

Basically, you put your code in a try-catch and when you get the error, you get a new cursor skipping the documents that you have already processed:

let processed = 0;
let updated = 0;

while(true) {
    const cursor = db.snapshots.find().sort({ _id: 1 }).skip(processed);

    try {
        while (cursor.hasNext()) {
            const doc = cursor.next();

            ++processed;

            if (doc.stream && doc.roundedDate && !doc.sid) {
                db.snapshots.update({
                    _id: doc._id
                }, { $set: {
                    sid: `${ doc.stream.valueOf() }-${ doc.roundedDate }`
                }});

                ++updated;
            } 
        }

        break; // Done processing all, exit outer loop
    } catch (err) {
        if (err.code !== 43) {
            // Something else than a timeout went wrong. Abort loop.

            throw err;
        }
    }
}

Note you need to sort the results for this solution to work.

With this approach, you are minimizing the number of requests to the server by using the maximum possible batch size of 16 MB, without having to guess how many documents you will be able to process in 10 minutes beforehand. Therefore, it is also more robust than the previous approach.


4. Query the results in batches manually

Basically, you use skip(), limit() and sort() to do multiple queries with a number of documents you think you can process in 10 minutes.

I consider this a bad solution because the driver already has the option to set the batch size, so there's no reason to do this manually, just use solution 1 and don't reinvent the wheel.

Also, it is worth mentioning that it has the same drawbacks than solution 1,


5. Get all the documents before the cursor expires

Probably your code is taking some time to execute due to results processing, so you could retrieve all the documents first and then process them:

const results = new Array(db.snapshots.find());

This will retrieve all the batches one after another and close the cursor. Then, you can loop through all the documents inside results and do what you need to do.

However, if you are having timeout issues, chances are that your result set is quite large, thus pulling everything in memory may not be the most advisable thing to do.


Note about snapshot mode and duplicate documents

It is possible that some documents are returned multiple times if intervening write operations move them due to a growth in document size. To solve this, use cursor.snapshot(). From the docs:

Append the snapshot() method to a cursor to toggle the “snapshot” mode. This ensures that the query will not return a document multiple times, even if intervening write operations result in a move of the document due to the growth in document size.

However, keep in mind its limitations:

  • It doesn't work with sharded collections.

  • It doesn't work with sort() or hint(), so it will not work with solutions 3 and 4.

  • It doesn't guarantee isolation from insertion or deletions.

Note with solution 5 the time window to have a move of documents that may cause duplicate documents retrieval is narrower than with the other solutions, so you may not need snapshot().

In your particular case, as the collection is called snapshot, probably it is not likely to change, so you probably don't need snapshot(). Moreover, you are doing updates on documents based on their data and, once the update is done, that same document will not be updated again even though it is retrieved multiple times, as the if condition will skip it.


Note about open cursors

To see a count of open cursors use db.serverStatus().metrics.cursor.

Danziger
  • 19,628
  • 4
  • 53
  • 83
  • Can the `Cursor not found` error arise from scenarios other than a timeout? – mils Jan 23 '18 at 22:36
  • I guess rarely, but if you do a quick search you will see it has happened a few times due to bugs in the drivers: https://github.com/meteor/meteor/issues/7763, https://github.com/go-mgo/mgo/pull/295 – Danziger Jan 24 '18 at 02:46
  • Just to be super clear, is the cursor timeout reset on the next batch retrieval (i.e. next time the client gets a batch of 101 documents)? Thanks – mils Jan 28 '18 at 23:22
  • 1
    @mils Yes, if you request a new batch, and the cursor has not expired already since the previous request (you are still within the 10 minutes window), it will be reset, so from that point on, you have 10 minutes to make another request (and reset the cursor again) before it is closed. – Danziger Jan 29 '18 at 01:41
  • Wow, what a nice answer! Thanks a lot! – softarn Apr 22 '21 at 08:30
  • "Retry when the cursor expires" is the option that is the best option for me – Jose A Lopez Pastor Jan 18 '22 at 07:32
4

It's a bug in mongodb server session management. Fix currently in progress, should be fixed in 4.0+

SERVER-34810: Session cache refresh can erroneously kill cursors that are still in use

(reproduced in MongoDB 3.6.5)

adding collection.find().batchSize(20) helped me with about a tiny reduced performance.

vovchisko
  • 2,049
  • 1
  • 22
  • 26
  • 1
    Please don't add the same answer to multiple questions. Answer the best one and flag the rest as duplicates. See [Is it acceptable to add a duplicate answer to several questions?](//meta.stackexchange.com/q/104227/206345) – Machavity Jun 07 '18 at 18:46
  • It's also worth noting that this is going to have significant performance hits if you're working with large collections with lots of data, as having a reduced batch size obviously means that you're going to have to make more trips to the database and back in order to get the next batches and process the same amount of data. – Olly John Aug 19 '20 at 08:27
4

I also ran into this problem, but for me it was caused by a bug in the MongDB driver.

It happened in the version 3.0.x of the npm package mongodb which is e.g. used in Meteor 1.7.0.x, where I also recorded this issue. It's further described in this comment and the thread contains a sample project which confirms the bug: https://github.com/meteor/meteor/issues/9944#issuecomment-420542042

Updating the npm package to 3.1.x fixed it for me, because I already had taken into account the good advises, given here by @Danziger.

SimonSimCity
  • 6,415
  • 3
  • 39
  • 52
1

When using Java v3 driver, noCursorTimeout should be set in the FindOptions.

DBCollectionFindOptions options =
                    new DBCollectionFindOptions()
                        .maxTime(90, TimeUnit.MINUTES)
                        .noCursorTimeout(true)
                        .batchSize(batchSize)
                        .projection(projectionQuery);        
cursor = collection.find(filterQuery, options);
user1240792
  • 334
  • 1
  • 4
  • 14
1

in my case, It was a Load balancing issue, had the same issue running with Node.js service and Mongos as a pod on Kubernetes. The client was using mongos service with default load balancing. changing the kubernetes service to use sessionAffinity: ClientIP (stickiness) resolved the issue for me.

Maoz Zadok
  • 4,871
  • 3
  • 33
  • 43
1

noCursorTimeout will NOT work

now is 2021 year, for

cursor id xxx not found, full error: {'ok': 0.0, 'errmsg': 'cursor id xxx not found', 'code': 43, 'codeName': 'CursorNotFound'}

official says

Consider an application that issues a db.collection.find() with cursor.noCursorTimeout(). The server returns a cursor along with a batch of documents defined by the cursor.batchSize() of the find(). The session refreshes each time the application requests a new batch of documents from the server. However, if the application takes longer than 30 minutes to process the current batch of documents, the session is marked as expired and closed. When the server closes the session, it also kills the cursor despite the cursor being configured with noCursorTimeout(). When the application requests the next batch of documents, the server returns an error.

that means: Even if you have set:

  • noCursorTimeout=True
  • smaller batchSize

will still cursor id not found after default 30 minutes

How to fix/avoid cursor id not found?

make sure two point

  • (explicitly) create new session, get db and collection from this session
  • refresh session periodically

code:

  • (official) js
var session = db.getMongo().startSession()
var sessionId = session.getSessionId().id
var cursor = session.getDatabase("examples").getCollection("data").find().noCursorTimeout()
var refreshTimestamp = new Date() // take note of time at operation start
while (cursor.hasNext()) {
  // Check if more than 5 minutes have passed since the last refresh
  if ( (new Date()-refreshTimestamp)/1000 > 300 ) {
    print("refreshing session")
    db.adminCommand({"refreshSessions" : [sessionId]})
    refreshTimestamp = new Date()
  }
  // process cursor normally
}
  • (mine) python
import logging
from datetime import datetime
import pymongo

mongoClient = pymongo.MongoClient('mongodb://127.0.0.1:27017/your_db_name')

# every 10 minutes to update session once
#   Note: should less than 30 minutes = Mongo session defaul timeout time
#       https://docs.mongodb.com/v5.0/reference/method/cursor.noCursorTimeout/
# RefreshSessionPerSeconds = 10 * 60
RefreshSessionPerSeconds = 8 * 60

def mergeHistorResultToNewCollection():

    mongoSession = mongoClient.start_session() # <pymongo.client_session.ClientSession object at 0x1081c5c70>
    mongoSessionId = mongoSession.session_id # {'id': Binary(b'\xbf\xd8\xd...1\xbb', 4)}

    mongoDb = mongoSession.client["your_db_name"] # Database(MongoClient(host=['127.0.0.1:27017'], document_class=dict, tz_aware=False, connect=True), 'your_db_name')
    mongoCollectionOld = mongoDb["collecion_old"]
    mongoCollectionNew = mongoDb['collecion_new']

    # historyAllResultCursor = mongoCollectionOld.find(session=mongoSession)
    historyAllResultCursor = mongoCollectionOld.find(no_cursor_timeout=True, session=mongoSession)

    lastUpdateTime = datetime.now() # datetime.datetime(2021, 8, 30, 10, 57, 14, 579328)
    for curIdx, oldHistoryResult in enumerate(historyAllResultCursor):
        curTime = datetime.now() # datetime.datetime(2021, 8, 30, 10, 57, 25, 110374)
        elapsedTime = curTime - lastUpdateTime # datetime.timedelta(seconds=10, microseconds=531046)
        elapsedTimeSeconds = elapsedTime.total_seconds() # 2.65892
        isShouldUpdateSession = elapsedTimeSeconds > RefreshSessionPerSeconds
        # if (curIdx % RefreshSessionPerNum) == 0:
        if isShouldUpdateSession:
            lastUpdateTime = curTime
            cmdResp = mongoDb.command("refreshSessions", [mongoSessionId], session=mongoSession)
            logging.info("Called refreshSessions command, resp=%s", cmdResp)
        
        # do what you want

        existedNewResult = mongoCollectionNew.find_one({"shortLink": "http://xxx"}, session=mongoSession)

    # mongoSession.close()
    mongoSession.end_session()
crifan
  • 12,947
  • 1
  • 71
  • 56