0

If there are two nested cursor.forEach() functions, the second one is not getting executed. The same happens with a while loop:

I want to remove duplicates from a huge collection, by moving documents to another collection, and checking if a duplicate already exists. I'm running the following code in the mongo shell:

var fromColl = db.from.find(),
    toColl;

fromColl.forEach(function(fromObj){
    toColl = db.to.find({name: fromObj.name});
    if (toColl.length() == 0) {
        //no duplicates found in the target coll, insert
        db.to.insert(fromObj);
    } else {
        //possible duplicates found in the target coll
        print('possible duplicates: ' + toColl.length());
        toColl.forEach(function(toObj){
            if (equal(fromObj.data, toObj.data)) {
                //duplicate...
            }
        });
    }
});

In the else block toColl.length() is printed, but the second forEach loop isn't executed. Does anyone know why?

orszaczky
  • 13,301
  • 8
  • 47
  • 54
  • Because `!0` is a True condition. This is a pretty well known programming idiom. Should be `toColl.length() > 0` – Neil Lunn Jun 11 '17 at 11:50
  • @NeilLunn That part works as its supposed, if there are 0 elements, the if part is executed, if it's false (there are more than 0 elements), then the else part is executed. But in that case the forEach loop is not executed – orszaczky Jun 11 '17 at 11:53
  • @NeilLunn can you explain how is my question the duplicate of the one you linked? I replaced the (!toColl.length()) shorthand with (toColl.length() == 0) because i think it confused you... – orszaczky Jun 11 '17 at 12:06
  • I found a solution for the problem, if the question gets reopened, I will share it... – orszaczky Jun 12 '17 at 11:02
  • Just saying "this isn't a duplicate" forces reviewers to guess what you are asking, and then determine if it is different from what the duplicate question is asking. For most reviewers, this is just too much effort. You need to spell out why you think it is not a duplicate. Something much more likely to get the appropriate response is "this question is about ___, but the proposed duplicate is about ___, therefore this question is not a duplicate". If you aren't willing to spend the effort to explain why your question is not a duplicate, reviewers shouldn't be expected to do the research. – jmarkmurphy Jun 12 '17 at 13:27
  • @jmarkmurphy, thank you, good point.. i added an explanation – orszaczky Jun 12 '17 at 14:11

1 Answers1

1

--- WORKAROUND ---

I found a workaround, and created an array of the second cursor: toColl = db.to.find({name: fromObj.name}).toArray(); and I iterated the array with a plain JS for loop:

var fromColl = db.from.find(),
    toColl,
    toObj;

fromColl.forEach(function(fromObj){
    toColl = db.to.find({name: fromObj.name}).toArray();
    if (toColl.length == 0) {
        //no duplicates found in the target coll, insert
        db.to.insert(fromObj);
    } else {
        //possible duplicates found in the target coll
        print('possible duplicates: ' + toColl.length());
        for (var i = 0; i < toColl.length; i++) {
            toObj = toColl[i];
            if (equal(fromObj.data, toObj.data)) {
                //duplicate...
            }
        });
    }
});

--- UPDATE ---

As Stephen Steneker pointed out:

The mongo shell has some shortcuts for working with data in the shell. This is explained in more detail in the MongoDB documentation: Iterate a Cursor in the mongo Shell.

In particular:

if the returned cursor is not assigned to a variable using the var keyword, then the cursor is automatically iterated up to 20 times to print up to the first 20 documents in the results.

In the code example the var declaration for toColl was prior to executing the find().

Iterating all the results with toArray() is a possible approach, but requires loading all documents returned by the cursor into RAM. Manually iterating the cursor is a more scalable approach.

-- SOLUTION --

The main problem turned out to be using toColl.length() instead of toColl.count().

Because toColl.length() resets the cursor.

Big thanks to Rhys Campbell and Stephen Steneker of the MongoDB user group for helping resolving this bug.

orszaczky
  • 13,301
  • 8
  • 47
  • 54