15

I am using pymongo in the following way:

from pymongo import *
a = {'key1':'value1'}
db1.collection1.insert(a)
print a

This prints

{'_id': ObjectId('53ad61aa06998f07cee687c3'), 'key1': 'value1'}

on the console. I understand that _id is added to the mongo document. But why is this added to my python dictionary too? I did not intend to do this. I am wondering what is the purpose of this? I could be using this dictionary for other purposes to and the dictionary gets updated as a side effect of inserting it into the document? If I have to, say, serialise this dictionary into a json object, I will get a

ObjectId('53ad610106998f0772adc6cb') is not JSON serializable

error. Should not the insert function keep the value of the dictionary same while inserting the document in the db.

user835199
  • 335
  • 1
  • 2
  • 7
  • _id is the primary key of the document and it's a required field. If _id doesn't exist, MongoDB will automatically create a ObjectId as _id. – Christian P Jun 27 '14 at 13:01
  • http://docs.mongodb.org/manual/core/crud-introduction/#mongodb-crud-introduction. actually the object is not json it is bson – sundar nataraj Jun 27 '14 at 13:04
  • 1
    My question is not why _id was added to the mongodb document. My question is why this key is added to my python dictionary, because I did not intend to change my dictionary object. I am updating my question to put this more clearly. – user835199 Jun 27 '14 at 14:59
  • 3
    The key is added because it is a new document and the parameter you send into insert is by reference. If you don't want it just unset it – Sammaye Jun 27 '14 at 15:27
  • @Sammaye "My question is not why _id was added to the mongodb document. My question is why this key is added to my python dictionary, because I did not intend to change my dictionary object. I am updating my question to put this more clearly." – lucid_dreamer Aug 19 '18 at 12:00
  • @lucid_dreamer I do actually make that clear "insert is by reference", which means that variables passed to this function are by reference, not copy on write – Sammaye Aug 19 '18 at 12:20
  • Still, why is there a need to add that to the in-memory dictionary passed to that function by reference? That's not clear. – lucid_dreamer Aug 21 '18 at 09:14
  • @lucid_dreamer your have to ask that of the python team, but it's the same in all languages that support pass by reference, so I would say it is just a personal choice, I guess maybe because it is more elegant then having to call the id in other ways – Sammaye Aug 22 '18 at 21:00
  • 1
    I don't get it. Why is it related to the python team? It is pymongo that adds that _id to the in-memory dictionary (after the insert). There is no need to do that as far as I can see. – lucid_dreamer Aug 29 '18 at 15:03

5 Answers5

1

As many other database systems out there, Pymongo will add the unique identifier necessary to retrieve the data from the database as soon as it's inserted (what would happen if you insert two dictionaries with the same content {'key1':'value1'} in the database? How would you distinguish that you want this one and not that one?)

This is explained in the Pymongo docs:

When a document is inserted a special key, "_id", is automatically added if the document doesn’t already contain an "_id" key. The value of "_id" must be unique across the collection.

If you want to change this behavior, you could give the object an _id attribute before inserting. In my opinion, this is a bad idea. It would easily lead to collisions and you would lose juicy information that is stored in a "real" ObjectId, such as creation time, which is great for sorting and things like that.

>>> a = {'_id': 'hello', 'key1':'value1'}
>>> collection.insert(a)
'hello'
>>> collection.find_one({'_id': 'hello'})
{u'key1': u'value1', u'_id': u'hello'}

Or if your problem comes when serializing to Json, you can use the utilities in the BSON module:

>>> a = {'key1':'value1'}
>>> collection.insert(a)
ObjectId('53ad6d59867b2d0d15746b34')
>>> from bson import json_util
>>> json_util.dumps(collection.find_one({'_id': ObjectId('53ad6d59867b2d0d15746b34')}))
'{"key1": "value1", "_id": {"$oid": "53ad6d59867b2d0d15746b34"}}'

(you can verify that this is valid json in pages like jsonlint.com)

Savir
  • 17,568
  • 15
  • 82
  • 136
0

_id act as a primary key for documents, unlike SQL databases, its required in mongodb.

to make _id serializable, you have 2 options:

  1. set _id to a JSON serializable datatype in your documents before inserting them (e.g. int, str) but keep in mind that it must be unique per document.

  2. use a custom BSON serializion encoder/decoder classes:

    from bson.json_util import default as bson_default
    from bson.json_util import object_hook as bson_object_hook
    
    class BSONJSONEncoder(json.JSONEncoder):
        def default(self, o):
            return bson_default(o)
    
    
    class BSONJSONDecoder(json.JSONDecoder):
        def __init__(self, **kwrgs):
            JSONDecoder.__init__(self, object_hook=bson_object_hook)
    
MBarsi
  • 2,417
  • 1
  • 18
  • 18
0

as @BorrajaX answered already want to add some more. _id is a unique identifier, when a document is inserted to the collection it generates with some random numbers. Either you can set your own id or you can use what MongoDB has created for you.

As documentation mentions about this.

For your case, you can simply ignore this key by using del keyword del a["_id"].

or

if you need _id for further operations you can use dumps from bson module.

import json
from bson.json_util import loads as bson_loads, dumps as bson_dumps 

a["_id"]=json.loads(bson_dumps(a["_id"]))

or

before inserting document you can add your custom _id you won't need serialize your dictionary

a["_id"] = "some_id"

db1.collection1.insert(a)
Sabuhi Shukurov
  • 1,616
  • 16
  • 17
0

This behavior can be circumvented by using the copy module. This will pass a copy of the dictionary to pymongo leaving the original intact. Based on the code snippet in your example, one should modifiy it like so:

import copy
from pymongo import *
a = {'key1':'value1'}
db1.collection1.insert(copy.copy(a))
print a
Utkonos
  • 631
  • 6
  • 21
-2

Clearly the docs answer your question

MongoDB stores documents on disk in the BSON serialization format. BSON is a binary representation of JSON documents, though it contains more data types than JSON.

The value of a field can be any of the BSON data types, including other documents, arrays, and arrays of documents. The following document contains values of varying types:

var mydoc = {
               _id: ObjectId("5099803df3f4948bd2f98391"),
               name: { first: "Alan", last: "Turing" },
               birth: new Date('Jun 23, 1912'),
               death: new Date('Jun 07, 1954'),
               contribs: [ "Turing machine", "Turing test", "Turingery" ],
               views : NumberLong(1250000)
            }

to know more about BSON

sundar nataraj
  • 8,524
  • 2
  • 34
  • 46
  • 4
    The docs you are referring to are of mongodb. I am talking about pymongo here and my question is not about mongo document but about python dictionary object. – user835199 Jun 27 '14 at 15:49