7

I'm trying to do something that I feel aught to be pretty trivial, so forgive me if there's some easy solution out there elsewhere.

I'm writing tests for some content indexing and for this I'm trying to insert some binary data (a pdf) into a mongo collection that I have. However, I'm having a good deal of trouble with this. This is the current state of my relevant code

pseudo_file = StringIO()
pdf = pisa.CreatePDF("This is a test", pseudo_file)
test = {"data": pseudo_file}
test.update({"files_id": {"name": "random_asset_name"}, "category": "asset"})
self.chunk_collection.insert(json.dumps(test))

I managed to find an old thread on the Pymongo google group addressing this problem (https://groups.google.com/forum/#!topic/mongodb-user/uBAbY1wdQbs), but I can't seem to find the Binary object that was used to fix that problem and it doesn't seem to be included in Python (I'm using 2.7)

Right now the problem I'm getting is that the StringIO object is not JSON serializable, which is sensible, but pymongo needs a valid utf8 object passed to it. I tried using a base64 encoding of the StringIO.getvalue(), and just directly serializing the same value.

Of course the pdf is not value utf8, so I'm wondering if there's another way to have pymongo recognize that I am sending it a raw binary. Any help is appreciated.

Slater Victoroff
  • 21,376
  • 21
  • 85
  • 144
  • What code did you try when you used the version on the google groups? What error did you get? There should be a binary helper you can use to store binary info in mongo – Sammaye Aug 13 '13 at 20:28
  • @Sammaye I tried the example just as shown. Do you have any idea where the binary helper is? Because it's not a part of python, it's not in the pymongo docs, and it doesn't seem to exist in any relevant python module. – Slater Victoroff Aug 13 '13 at 21:00
  • You have to import it from the pymongo driver, here is a full example: http://stackoverflow.com/questions/11915770/saving-picture-to-mongodb and here is the docs: http://api.mongodb.org/python/current/api/bson/binary.html – Sammaye Aug 13 '13 at 21:19
  • @Sammaye You just pointed me towards two totally different things D: The import statement in the first link isn't working for me, I'm using pymongo 2.4.1, maybe it's a recent rollout? The second link was way more helpful. Do you want to make this an answer so I can give you credit? It seems like a lot of the lore around this feature is at least somewhat faulty. – Slater Victoroff Aug 13 '13 at 22:08
  • Yes I believe now that I look: https://github.com/Fiedzia/Fang-of-Mongo/issues/11 that the binary file was moved – Sammaye Aug 13 '13 at 22:11
  • @Sammaye Oooh! Now it all makes sense. Thanks for doing all that digging. You should totally make an answer out of it so I can accept it and someone coming by later doesn't have to dig through the comments. – Slater Victoroff Aug 13 '13 at 22:19

2 Answers2

3

The Google group is actually correct however, sometime after the post on there the binary class was moved to the bson namespace as such you must import it from there.

Good examples exist on the documentation page: http://api.mongodb.org/python/current/api/bson/binary.html

Sammaye
  • 43,242
  • 7
  • 104
  • 146
2

This can be achieved with bson. A full round-trip in the example of pickling/unpickling an object would look like:

import bson

# serialization
collection.insert_one({
    "binary_field": bson.Binary(pickle.dumps(my_object)),
})

# deserialization
record = collection.find_one({ ... })
pickle.loads(record["binary_field"])
# Note that the Binary type can be passed into pickle.loads directly.

It should be noted that the bson package is -- despite being a top-level package -- part of pymongo. According to the pymongo package description:

Do not install the “bson” package from pypi. PyMongo comes with its own bson package; doing “easy_install bson” installs a third-party package that is incompatible with PyMongo.

bluenote10
  • 23,414
  • 14
  • 122
  • 178