1

I am currently pulling data into MongoDB, and will later need to pull this data into a separate application. This application has a requirement for the _id field to be a 32bit integer.

Be sure to explicitly set the _id attribute in the result document to unique 32 bit integers. source

I am making use of pymongo to insert documents into a collection.

def parse_tweet(in_t):
    t = {}
    t["text"] = in_t["text"]
    t["shape"] = in_t["coordinates"]["coordinates"][0], in_t["coordinates"]["coordinates"][1]
    return t

This gives me the expected documents:

{
  "_id" : ObjectId("50a0de04f26afb14f4bba03d"),
  "text" : "hello world",
  "shape" : [144.9557834, -37.8208589],
}

How can I explicitly set the _id value to be a 32bit integer?
I don't intend on storing more than 6 million documents.

jakc
  • 1,161
  • 3
  • 15
  • 42

1 Answers1

2

Just generate an id and pass it along. Id can be anything (except for array).

def parse_tweet(in_t):
    t = {}
    t["_id"] = get_me_an_int32_id
    t["text"] = in_t["text"]
    t["shape"] = in_t["coordinates"]["coordinates"][0], in_t["coordinates"]["coordinates"][1]
    return t

You will have to take care of its uniqueness yourself. MongoDB will only ensure that you don't store duplicate values. But where you get unique values - that's your problem.

Here are some ideas: How to make an Autoincrementing field.

Sergio Tulentsev
  • 226,338
  • 43
  • 373
  • 367
  • pymongo may handle this differently (I'm PHP) since this is not in the docs: http://api.mongodb.org/python/current/api/index.html however the default storage of MongoDB is not as a strict int 32 as such if your app is not orientated to handle this is could break as such I would make sure it use the equiviliant of http://php.net/manual/en/class.mongoint32.php or just code the app to handle this slight facet of Mongo – Sammaye Nov 12 '12 at 12:44
  • Any chance you could share some of the logic in the get_me_an_int32_id function? Im looking at ur link, but not sure on the most elegant counter for python? – jakc Nov 12 '12 at 22:20
  • 1
    Nomrally the best way, if you have control over the application, is to code the application to cast all numbers gathered from MongoDB to safe ints using the `int()` function as shown here: http://stackoverflow.com/questions/3387655/safest-way-to-convert-float-to-integer-in-python – Sammaye Nov 13 '12 at 08:14