18

I am getting the following error when trying to do a bulk insert into an empty mongodb collection.

pymongo.errors.DuplicateKeyError: E11000 duplicate key error index: cmdDistros.locDistro.$id dup key: { : ObjectId('51dac9d0c74cd81acd85c0fd') }

I am not specifying an _id when I create any of the documents, so mongodb should create the unique index correct? Here is the code I used:

#Populate database with uniform distribution
            entries = []
            for coor in freeIndices:
                for theta in range(360):
                    entry = {"x" : coor[0], "y" : coor[1], "heading" : theta}
                    for i in range(numData):
                            entry["data" + str(i)] = 1./numData
                    entries.append(entry)
            print "Entries created, loading into database..."

            locDistro.insert(entries)

Taking fate out of mongoDB's hands, I tried creating my own index using:

#Populate database with uniform distribution
            entries = []
            idNum = 0
            for coor in freeIndices:
                for theta in range(360):
                    print idNum
                    entry = {"_id" : idNum, "x" : coor[0], "y" : coor[1], "heading" : theta}
                    idNum += 1
                    for i in range(numData):
                            entry["data" + str(i)] = 1./numData
                    entries.append(entry)
            print "Entries created, loading into database..."

            locDistro.insert(entries, manipulate = False)

The print statement showed each idnum as the documents were created, and they were all unique and incremented just as expected. However on insert, I received the error:

pymongo.errors.DuplicateKeyError: E11000 duplicate key error index: cmdDistros.locDistro.$id dup key: { : 0 }

and only one document was inserted into my database.

I am completely stumped, anyone have an answer as to why this might be happening?

Community
  • 1
  • 1
RoboCop87
  • 825
  • 1
  • 8
  • 21
  • 1
    I don't know what happened, but the problem has somehow managed to fix itself...I just ran the code over and over and it just worked....weird..if anyone has an explanation I would still want to know just in case it happens again...and by the way, the entries.append line is a typo, the actual position is aligned with the for above it. – RoboCop87 Jul 08 '13 at 14:57
  • are there any other indexes defined in the collection? – WiredPrairie Jul 08 '13 at 14:57
  • None. Just _id. I would like to use ensure_index to create an index on x y and heading, but they aren't unique so I am not sure if that would work. Regardless only _id is indexed at this point. – RoboCop87 Jul 08 '13 at 15:09
  • are the two options working now? _id created by the driver and _id created by yourself. – innoSPG Jul 08 '13 at 20:18
  • The _id created by the driver is working now, I have not retried creating my own _ids, and experience has left me wary of touching code after I know it works. – RoboCop87 Jul 09 '13 at 13:20
  • how did you create locDistro object ? By the way you should be able to uncheck the accept flag by clicking on it. – Xavier Combelle Aug 19 '14 at 07:39

5 Answers5

29

You need to understand that your entries list has a bunch of references to one entry dict. So when PyMongo sets entries[0]['_id'], all the other entries get the same _id. (In fact, PyMongo will iterate through the list setting each entry's _id, so all the entries will have the final _id at the end.) A quick fix would be:

entries.append(entry.copy())

This is merely a shallow copy, but in the code you shared I believe this is enough to fix your problem.

A. Jesse Jiryu Davis
  • 23,641
  • 4
  • 57
  • 70
  • 2
    the entry is being created on every loop, how can be they pointing to same object? I think this is not the correct answer – Arshad Ansari Mar 15 '17 at 15:46
  • I am stuck with this weird behavior of pymongo 's `insert_one` as well. I heard both the `copy` method and the `del obj['_id']` method. I still don't get the copy method though. Do I have to make a copy too for each object I insert although the object is different? – addicted Feb 14 '19 at 14:51
  • This answer also applies to NodeJS when inserting multiple objects with the same reference lol. Thanks – Ricky Boyce May 17 '19 at 03:22
  • This fixed my issue however i'm still not clear on why this should be the case. entry is initialized on each iteration, before appended to entries. something weird going on – Anthony Awuley Dec 02 '19 at 00:47
10

Delete the key "_id":

for i in xrange(2): 
    doc['i'] = i 
    if '_id' in doc: 
        del doc['_id'] 
    collection.insert(doc)

Or manually create a new one:

from bson.objectid import ObjectId

for i in xrange(2): 
    doc['i'] = i 
    doc['_id'] = ObjectId() 
    collection.insert(doc)

Getting "err" : "E11000 duplicate key error when inserting into mongo using the Java driver

tbobm
  • 113
  • 1
  • 9
gaurhari dass
  • 157
  • 2
  • 8
5

I had the same error using insert_one() and also insert_many()

My solution is, to use update_one() with upsert=True

  doc = {a: 1, b:2, x:{xx:"hello",yy:"world"}}
  db.collection.update_one(doc,{'$set':doc},upsert=True)

This works for me :-)

Alfredo cubitos
  • 161
  • 2
  • 5
3

Make sure variable 'entries' is cleared after every insert.

The problem is that PyMongo injects an _id field into the document, if the _id field does not exist, before inserting it (_id is always generated client side). That means that the first time through the loop _id is added by the insert method. Since 'entries' is defined outside, each subsequent pass through the loop uses the same value for _id.

Clear the dict variable in top of the loop statements.

OR

Remove _id from the dict. Eg:

del my_dict['_id'] 
Safvan CK
  • 1,140
  • 9
  • 18
1

SOLUTION: Declare dict() item inside the loop and then populate and insert it. I had a similar problem while using insert_one() from pymongo. I solved my problem by declaring the dict() item inside the loop. Here is the working version of your code:

#Populate database with uniform distribution
            entries = []
            for coor in freeIndices:
                for theta in range(360):
                    entry = dict()
                    entry['x'] = coor[0]
                    entry['y'] = coor[1]
                    entry['heading'] = theta
             
                    for i in range(numData):
                            entry['data' + str(i)] = 1./numData
                    entries.append(entry)
            print "Entries created, loading into database..."

            locDistro.insert(entries)
Arslan Arif
  • 325
  • 2
  • 12