71

Is there any way of keeping a result variable in memory so I don't have to recalculate it each time I run the beginning of my script? I am doing a long (5-10 sec) series of the exact operations on a data set (which I am reading from disk) every time I run my script. This wouldn't be too much of a problem since I'm pretty good at using the interactive editor to debug my code in between runs; however sometimes the interactive capabilities just don't cut it.

I know I could write my results to a file on disk, but I'd like to avoid doing so if at all possible. This should be a solution which generates a variable the first time I run the script, and keeps it in memory until the shell itself is closed or until I explicitly tell it to fizzle out. Something like this:

# Check if variable already created this session
in_mem = var_in_memory() # Returns pointer to var, or False if not in memory yet
if not in_mem:
    # Read data set from disk
    with open('mydata', 'r') as in_handle:
        mytext = in_handle.read()
    # Extract relevant results from data set
    mydata = parse_data(mytext)
    result = initial_operations(mydata)
    in_mem = store_persistent(result)

I've an inkling that the shelve module might be what I'm looking for here, but looks like in order to open a shelve variable I would have to specify a file name for the persistent object, and so I'm not sure if it's quite what I'm looking for.

Any tips on getting shelve to do what I want it to do? Any alternative ideas?

machine yearning
  • 9,889
  • 5
  • 38
  • 51

8 Answers8

67

You can achieve something like this using the reload global function to re-execute your main script's code. You will need to write a wrapper script that imports your main script, asks it for the variable it wants to cache, caches a copy of that within the wrapper script's module scope, and then when you want (when you hit ENTER on stdin or whatever), it calls reload(yourscriptmodule) but this time passes it the cached object such that yourscript can bypass the expensive computation. Here's a quick example.

wrapper.py

import sys
import mainscript

part1Cache = None
if __name__ == "__main__":
    while True:
        if not part1Cache:
            part1Cache = mainscript.part1()
        mainscript.part2(part1Cache)
        print "Press enter to re-run the script, CTRL-C to exit"
        sys.stdin.readline()
        reload(mainscript)

mainscript.py

def part1():
    print "part1 expensive computation running"
    return "This was expensive to compute"

def part2(value):
    print "part2 running with %s" % value

While wrapper.py is running, you can edit mainscript.py, add new code to the part2 function and be able to run your new code against the pre-computed part1Cache.

Peter Lyons
  • 142,938
  • 30
  • 279
  • 274
  • 3
    I would consider adding a exception handler, where you run the external source. – mehmetminanc Mar 30 '14 at 02:11
  • 1
    What happens if a dependency of mainscript.py is updated? Do I need to reload it explicitly? – pomber Dec 22 '14 at 23:52
  • Would storing the variable in os.environ not suffice? – Ladmerc Apr 18 '17 at 07:38
  • 2
    Not without reload, no. The environment would be cleared when your python process died. But using the above mechanism you could store in os.environ but now you are exposing data outside of your process and are limited to strings of a certain length. By using a straightforward python variable, you avoid those limitations. – Peter Lyons Apr 19 '17 at 15:15
  • @mehmetminanc, good suggestion as I need the exception handler immediately to make the example workable. – auro May 15 '17 at 20:33
  • 5
    Excellent answer! For Python3, include: "from importlib import reload" in wrapper.py – Rexcirus Dec 08 '19 at 23:26
  • i cant find reload in python 3 – Eliav Louski Aug 02 '21 at 09:23
  • @EliavLouski Did you put `from importlib import reload` in wrapper.py, as Rexcirus said above? – Geremia Jan 18 '23 at 04:20
11

To keep data in memory, the process must keep running. Memory belongs to the process running the script, NOT to the shell. The shell cannot hold memory for you.

So if you want to change your code and keep your process running, you'll have to reload the modules when they're changed. If any of the data in memory is an instance of a class that changes, you'll have to find a way to convert it to an instance of the new class. It's a bit of a mess. Not many languages were ever any good at this kind of hot patching (Common Lisp comes to mind), and there are a lot of chances for things to go wrong.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
  • Thank you very much for the informative answer. It's nice to know why a particular solution doesn't work the way I want it to, I appreciate your explanation. – machine yearning Jul 14 '11 at 02:19
11

If you only want to persist one object (or object graph) for future sessions, the shelve module probably is overkill. Just pickle the object you care about. Do the work and save the pickle if you have no pickle-file, or load the pickle-file if you have one.

import os
import cPickle as pickle

pickle_filepath = "/path/to/picklefile.pickle"

if not os.path.exists(pickle_filepath):
    # Read data set from disk
    with open('mydata', 'r') as in_handle:
        mytext = in_handle.read()
    # Extract relevant results from data set
    mydata = parse_data(mytext)
    result = initial_operations(mydata)
    with open(pickle_filepath, 'w') as pickle_handle:
        pickle.dump(result, pickle_handle)
else:
    with open(pickle_filepath) as pickle_handle:
        result = pickle.load(pickle_handle)
Matt Anderson
  • 19,311
  • 11
  • 41
  • 57
  • Pickle also tends to be faster than shelve – pufferfish Oct 04 '12 at 10:43
  • This answer is still technically correct, but I'd also suggest using JSON to read and write a cache file. It's also human readable and portable if you decide later to change the language. I don't know if JSON is "faster" or not than pickle (but realistically it's not going to impede code that's otherwise performant) – Scott Prive Feb 25 '20 at 14:46
  • "No module named 'cPickle'" in python 3.11 – PythoNic May 04 '23 at 15:29
7

Python's shelve is a persistence solution for pickled (serialized) objects and is file-based. The advantage is that it stores Python objects directly, meaning the API is pretty simple.

If you really want to avoid the disk, the technology you are looking for is a "in-memory database." Several alternatives exist, see this SO question: in-memory database in Python.

Community
  • 1
  • 1
Ray Toal
  • 86,166
  • 18
  • 182
  • 232
3

Weirdly, none of the earlier answers here mention simple text files. The OP says they don't like the idea, but as this is becoming a canonical for duplicates which might not have that constraint, this alternative deserves a mention. If all you need is for some text to survive between invocations of your script, save it in a regular text file.

def main():
    # Before start, read data from previous run
    try:
        with open('mydata.txt', encoding='utf-8') as statefile:
            data = statefile.read().rstrip('\n')
    except FileNotFound:
        data = "some default, or maybe nothing"

    updated_data = your_real_main(data)

    # When done, save new data for next run
    with open('mydata.txt', 'w', encoding='utf-8') as statefile:
        statefile.write(updated_data + '\n')

This easily extends to more complex data structures, though then you'll probably need to use a standard structured format like JSON or YAML (for serializing data with tree-like structures into text) or CSV (for a matrix of columns and rows containing text and/or numbers).

Ultimately, shelve and pickle are just glorified generalized versions of the same idea; but if your needs are modest, the benefits of a simple textual format which you can inspect and update in a regular text editor, and read and manipulate with ubiquitous standard tools, and easily copy and share between different Python versions and even other programming languages as well as version control systems etc, are quite compelling.

As an aside, character encoding issues are a complication which you need to plan for; but in this day and age, just use UTF-8 for all your text files.

Another caveat is that beginners are often confused about where to save the file. A common convention is to save it in the invoking user's home directory, though that obviously means multiple users cannot share this data. Another is to save it in a shared location, but this then requires an administrator to separately grant write access to this location (except I guess on Windows; but that then comes with its own tectonic plate of other problems).

The main drawback is that text is brittle if you need multiple processes to update the file in rapid succession, and slow to handle if you have lots of data and need to update parts of it frequently. For these use cases, maybe look at a database (probably start with SQLite which is robust and nimble, and included in the Python standard library; scale up to Postgres or etc if you have entrerprise-grade needs).

And, of course, if you need to store native Python structures, shelve and pickle are still there.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • 1
    For the enterprise corporate department of redundancy department of the military-industrial complex of corporations and enterprises, I should maybe perhaps probably likely indeed by all means certainly for sure mention and make sure to memember to not forget to also include XML, that shibboleth behemoth juggernaut of a favorite of the enterprise corporate department of redundancy department of the military-industrial complex of corporations and enterprises. – tripleee Aug 25 '21 at 12:54
  • he said explicitly he didn't want to write to disk – Jules G.M. Sep 30 '21 at 21:10
  • 1
    @Jules The first two sentences explain why I posted this here anyway. – tripleee Oct 01 '21 at 04:15
2

This is a os dependent solution...

$mkfifo inpipe

#/usr/bin/python3
#firstprocess.py
complicated_calculation()
while True:
 with open('inpipe') as f:
  try:
   print( exec (f.read()))
  except Exception as e: print(e)

$./first_process.py &
$cat second_process.py > inpipe

This will allow you to change and redefine variables in the first process without copying or recalculating anything. It should be the most efficient solution compared to multiprocessing, memcached, pickle, shelve modules or databases.

This is really nice if you want to edit and redefine second_process.py iteratively in your editor or IDE until you have it right without having to wait for the first process (e.g. initializing a large dict, etc.) to execute each time you make a change.

John
  • 633
  • 6
  • 10
  • Thanks, I want to use something like this as a lightweight alternative to an HTTP server for a compute on localhost to call from another language. That way I can prevent module loading overhead being repeated for each request. I wonder though, what does `with open('inpipe') as f` do when there is nothing being put in at the other end? – Milind R Feb 17 '19 at 07:45
  • The `open` doesn't make a copy of the memory? – Geremia Jan 19 '23 at 02:07
0

You could run a persistent script on the server through the os which loads/calcs, and even periodically reloads/recalcs the sql data into memory structures of some sort and then acess the in-memory data from your other script through a socket.

Stephen
  • 582
  • 1
  • 4
  • 11
0

You can do this but you must use a Python shell. In other words, the shell that you use to start Python scripts must be a Python process. Then, any global variables or classes will live until you close the shell.

Look at the cmd module which makes it easy to write a shell program. You can even arrange so that any commmands that are not implemented in your shell get passed to the system shell for execution (without closing your shell). Then you would have to implement some kind of command, prun for instance, that runs a Python script by using the runpy module.

http://docs.python.org/library/runpy.html

You would need to use the init_globals parameter to pass your special data to the program's namespace, ideally a dict or a single class instance.

Michael Dillon
  • 31,973
  • 6
  • 70
  • 106