4

I wrote a program in python that takes several hours to calculate. I now want the program to save all the memory from time to time (mainly numpy-arrays). In that way I can restart calculations starting from the point where the last save happened. I am not looking for something like 'numpy.save(file,arr)' but a way to save all the memory in one time...

Kind Regards, Mattias

  • I doubt this is possible. You will probably have to do that manually. But it would be a nice feature. – freakish Jan 08 '14 at 10:32
  • 1
    Have a look at the module `joblib` (you probably will have to install this first); it provides means to solve your issues, especially concerning `numpy`. – Alfe Jan 08 '14 at 10:32
  • check out this question http://stackoverflow.com/q/141802/3005167 (Assuming you want to save the entire state of the program) – MB-F Jan 08 '14 at 10:34
  • Are you asking for something like the [save](http://www.mathworks.es/es/help/matlab/ref/save.html) function in Matlab? – phyrox Jan 08 '14 at 11:06
  • I don't know Matlab very well but indeed I think this is what I am looking for... – user3172738 Jan 09 '14 at 10:03

2 Answers2

1

I agree with @phyrox, that dill can be used to persist your live objects to disk so you can restart later. dill can serialize numpy arrays with dump(), and the entire interpreter session with dump_session().

However, it sounds like you are really asking about some form of caching… so I'd have to say that the comment from @Alfe is probably a bit closer to what you want. If you want seamless caching and archiving of arrays to memory… then you want joblib or klepto.

klepto is built on top of dill, and can cache function inputs and outputs to memory (so that calculations don't need to be run twice), and it can seamlessly persist objects in the cache to disk or to a database.

The versions on github are the ones you want. https://github.com/uqfoundation/klepto or https://github.com/joblib/joblib. Klepto is newer, but has a much broader set of caching and archiving solutions than joblib. Joblib has been in production use longer, so it's better tested -- especially for parallel computing.

Here's an example of typical klepto workflow: https://github.com/uqfoundation/klepto/blob/master/tests/test_workflow.py

Here's another that has some numpy in it: https://github.com/uqfoundation/klepto/blob/master/tests/test_cache.py

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
0

Dill can be your solution: https://pypi.python.org/pypi/dill

Dill provides the user the same interface as the 'pickle' module, and also includes some additional features. In addition to pickling python objects, dill provides the ability to save the state of an interpreter session in a single command. Hence, it would be feasable to save a interpreter session, close the interpreter, ship the pickled file to another computer, open a new interpreter, unpickle the session and thus continue from the 'saved' state of the original interpreter session.

An example:

import dill as pickle;
from numpy import array;

a = array([1,2]);
pickle.dump_session('sesion.pkl')
a = 0;
pickle.load_session('sesion.pkl')
print a;

Since dill conforms to the 'pickle' interface, the examples and documentation at http://docs.python.org/library/pickle.html also apply to dill if one will import dill as pickle

Nota that there are several types of data that you can not save. Check them first.

phyrox
  • 2,423
  • 15
  • 23
  • I now tryed using the pickle module, but it gives problems when I want to unpickle objects that were in a class before. What I did: I put all the objects of one class in a list, which I pickle. When I unpickle this list I get the AttributeError: 'module' object has no attribute '*Class Name*'. Any idea? – user3172738 Jan 09 '14 at 10:13
  • @user3172738: are you using the `pickle` module or the `dill` module? And you'd have a better chance of having someone help answer your question in the comment if you edited your original post (or something similar) with an update that showed what happens when you tried it. – Mike McKerns Jan 23 '14 at 03:35