7

There are many scattered posts out on StackOverflow, regarding Python modules used to save and load data.

I myself am familiar with json and pickle and I have heard of pytables too. There are probably more out there. Also, each module seems to fit a certain purpose and has its own limits (e.g. loading a large list or dictionary with pickle takes ages if working at all). Hence it would be nice to have a proper overview of possibilities.

Could you then help providing a comprehensive list of modules used to save and load data, describing for each module:

  • what the general purpose of the module is,
  • its limits,
  • why you would choose this module over others?
neydroydrec
  • 6,973
  • 9
  • 57
  • 89

2 Answers2

7

marshal:

  • Pros:

    • Can read and write Python values in a binary format. Therefore it's much faster than pickle (which is character based).
  • Cons:

    • Not all Python object types are supported. Some unsupported types such as subclasses of builtins will appear to marshal and unmarshal correctly
    • Is not intended to be secure against erroneous or maliciously constructed data.
    • The Python maintainers reserve the right to modify the marshal format in backward incompatible ways should the need arise

shelve

  • Pros:

    • Values in a shelf can be essentially arbitrary Python objects
  • Cons:

    • Does not support concurrent read/write access to shelved objects

ZODB (suggested by @Duncan)

  • Pro:

    • transparent persistence
    • full transaction support
    • pluggable storage
    • scalable architecture
  • Cons

    • not part of standard library.
    • unable (easily) to reload data unless the original python object model used for persisting is available (consider version difficulties and data portability)
Duncan
  • 92,073
  • 11
  • 122
  • 156
qiao
  • 17,941
  • 6
  • 57
  • 46
  • Add to marshal cons: "the Python maintainers reserve the right to modify the marshal format in backward incompatible ways should the need arise" – Janne Karila Jan 17 '12 at 12:26
  • 1
    You could add ZODB ( http://www.zodb.org/documentation/tutorial.html ) as a third. Pro: transparent persistence, full transaction support, pluggable storage, scalable architecture. Cons: not part of standard library. – Duncan Jan 17 '12 at 13:32
  • 1
    @Duncan Added, thanks. BTW, your reputation is high enough to edit this post, isn't that what SO encourages :) – qiao Jan 17 '12 at 13:39
  • Thanks for this answer so far. Collective answer is definitely encouraged and seems fitting for this question. Also feel free to add your opinion on pro's and con's regarding json etc. as mentioned in the question, it will be only better for comparison. – neydroydrec Jan 17 '12 at 19:58
4

There is an overview of the standard lib data persistence modules.

Gandaro
  • 3,427
  • 1
  • 17
  • 19
  • 2
    that's hardly an answer, and not what the OP asked for! see http://stackoverflow.com/faq#deletion – Don Question Jan 17 '12 at 12:35
  • @DonQuestion There is a good source for finding out about the modules for data persistence in the standard lib where you can find the pros and cons of them, so why do you have to write again just what is already written in the docs? – Gandaro Jan 17 '12 at 13:05
  • To answer a question you have to read it. The OP hinted that he has already looked into some persistence solutions and was aware of their weaknesses. He was **not** asking for a link to the standard python persistence modules, but for an educated recommendation of some soultions and **why** we would suggest them with pros/contras and their primary use-cases. I don't see this either in your answer, nor in the provided link. If you would follow my suggested link you would realize, that your answer is not sufficient. Always keep in mind, that other future users may have similar questions. – Don Question Jan 17 '12 at 15:41