2

I need an interface to mongodb by which I can treat data in a collection like a standard python file-like object. These will be fairly small files (measured in kilobytes, at most) and in particular I need the ability to append to these so-called files. (So this question is not a dupe.)

I have read the GridFS documentation, and in particular it says I should not use it for small files. The only other implementations I've been able to find have all been PHP. I'm not really looking for help writing any specifics of the code, but implementing the entire file api seems a daunting task.

  1. Are there any shortcuts or tools to make it easier to implement file-like objects in python 2?
  2. Am I missing that someone has already done this?

(Why am I doing this? Because I received an eleventh-hour requirement that we deploy a pre-existing application that produces csv files on a multinode cloud environment that cannot transparently handle files.)

Community
  • 1
  • 1
kojiro
  • 74,557
  • 19
  • 143
  • 201

2 Answers2

2

For question 1: check out the io module, and especially IOBase. It implements all of the file-likes in terms of a fairly sensible set of methods.

David Wolever
  • 148,955
  • 89
  • 346
  • 502
  • IOBase was very helpful, although its readline(s) methods apparently require read() to output bytes, which means you have to manually juggle character encodings (never fun, but especially in Python). – kojiro Jun 24 '12 at 13:40
0

You could just store the data as binary, or text, in a MongoDB collection. But you'd have two problems:

  1. You'd have to implement as much of the Python file protocol as your other code expects to have implemented.

  2. When you append to the "file", the document would grow in MongoDB and possibly need to be moved on disk to a location with enough space to hold the larger document. Moving documents is expensive.

Go with GridFS -- the documentation discourages you from using for static files but for your case it's perfect because PyMongo has done the work for you of implementing Python's file protocol for MongoDB data. To append to a GridFS file you must read it, save a new version with the additional data, and delete the previous version. But this isn't much more expensive than moving a grown document anyway.

A. Jesse Jiryu Davis
  • 23,641
  • 4
  • 57
  • 70