0

I'd like to be able to run a python file (file1) that simply loads several large files into memory as python objects, then, with a different python file (file2), access those same objects without having to reload the files into memory a second time. The motivation is that I want to be able to iteratively modify/develop file2 without having to waste time reloading the same large files on every iteration.

In a Jupyter notebook this is easily accomplished by running a cell that loads the files once; these objects are then accessible to all other cells in the notebook. I'd like to be able to establish this same cross talk between separate python files.

Is there a way to establish the within-notebook Jupyter style cell-to-cell sharing of python objects between separate .py files?


(Edited to include an example)

Below is an example scenario; let's say there are two files:

file1.py:

from sklearn.externals import joblib
q = joblib.load('large_dict') #load a large dictionary that has been saved to disk; let's say it takes 2 minutes to load
p = joblib.load('large_dict2') #load another large file; another 2 minutes load time

file2.py:

#notice that the q, p objects are never loaded, but they are accessible 
#(this is possible if the contents of these py files are in separate cells
#in a Jupyter notebook)
for experiment, chromosome in q.iteritems():
    #stuff to work with dict object
for experiment, chromosome in p.iteritems():
    #stuff to work with dict object

I want to do

python file1.py

once, and then do

python file2.py

an arbitrary number of times (i.e. iteratively modify the code in file2). Notice that in this scenario the objects created in file1.py are accessible to file2.py. My question is: Is this scenario possible?

Ryan
  • 3,555
  • 1
  • 22
  • 36
  • What do you mean by "Sharing objects between files"? Do you mean "between threads"? – Vsevolod Timchenko Jun 24 '16 at 09:49
  • I think the question is pretty clear as it is – Ryan Jun 24 '16 at 09:51
  • 1
    No, it isn't. I have the same question. A "file" is not something that does anything in Python. It may be the main module, it may be imported, and so on, but then its activity depends on *that*, not on it being a "file". Please clarify. If multiple people say your question is unclear, it is unclear. – Rory Daulton Jun 24 '16 at 09:53
  • @Ryan. Oh, okay. Despite the fact that files have no direct connection to objects and "Sharing objects between files" doesn't even make sense as a question. – Vsevolod Timchenko Jun 24 '16 at 09:54
  • It appears as though I'm lacking (at least) some knowledge of how python objects are stored in memory. Despite this, I can't see how I was anything but completely clear on what my goal is, so it would be nice to get some constructive feedback or help in clarifying exactly what I'm missing – Ryan Jun 24 '16 at 09:58
  • @Ryan as I pointed out you can reload modules, so you can put your data in module1, functions in module2, modify functions, reload module2 and repeat. That said it's can create some problems (read the quoted question on them). Personally I think jupyter als alright for "trying"/"toying" and then you can later copy the "ready" code into a file. – syntonym Jun 24 '16 at 10:05
  • The aim of the [Singleton pattern](https://en.wikipedia.org/wiki/Singleton_pattern) seems to fit exactly what you need – Luc Giffon Jun 24 '16 at 12:14

1 Answers1

4

Objects do not belong to a specific file. The class they belong to or the function that generates them may have been out of a module that "physically" resides in a different file, but this doesn't matter. As long as you are in a single python interpreter session objects will not need to be copied.

There is one problem: If you have a module and you want to modify it and you want to load the newest version of the module into a running python interpreter which already imported the module it will "refuse" to do so (this is actually a performance optimization so that you can savely import modules more than once).

You can "force" the python interpreter to reload a module by importlib.reload in python3 or the reload builtin in python2. See this question for more information.

In your example the data will not be shared because you are having two different python processes. Data is not shared between two processes (generally, so if you have two "C processes" (programs that are written in C) they also don't share any data. They can though send the data to each other but this needs copying, which you want to avoid).

But you can import the data and functions into a "shared" python interpreter.

file1.py:

from sklearn.externals import joblib
q = joblib.load('large_dict') #load a large dictionary that has been saved to disk; let's say it takes 2 minutes to load
p = joblib.load('large_dict2') #load another large file; another 2 minutes load time

file2.py:

from file1 import q,p
#notice that the q, p objects are never loaded, but they are accessible 
#(this is possible if the contents of these py files are in cells
#in a Jupyter notebook)
    for experiment, chromosome in q.iteritems():
        #stuff to work with dict object
    for experiment, chromosome in p.iteritems():
        #stuff to work with dict object

file3.py:

import file2 
# file2.py will be executed once when importing
# attributes will be accessible by file2.attribute_name

inp = ""
while inp != "quit":
    inp = input("Type anything to reload or 'quit' to quit")

    # 'import file2' would **not** execute file2 because imports
    # are only done once. Use importlib.reload (or reload in python2)
    # to "force" reloading of module 
    importlib.reload(file2)

Then you can "start executing" by python file3.py and it will wait for any input for you to reload file2. Of course you could make the mechanism when to reload arbitrary complex, e.g. reload whenever file2.py changes (watchdog might be helpful for that)

Another way would be to use something like

file4.py:

import importlib
import file2
def reload():
     importlib.reload(file2)

and then use python -i file4.py. You are then in a normal python interpreter but reload() will reload (i.e. executes) file2.

Note that you can do the same in a jupyter/ipython notebook. There are even some magic commands to help you with that. See the documentation for more information on that.

Community
  • 1
  • 1
syntonym
  • 7,134
  • 2
  • 32
  • 45
  • I don't see any proposals for how I would accomplish my stated goal or a clear answer to the question – Ryan Jun 24 '16 at 09:50
  • Your question doesn't really make sense, because objects have no connection to .py files. So you cannot share objecst between .py files because that does not make sense. If you have two modules (files) and import the first one into the second one, all objects will be available in the second module (without copiying). – syntonym Jun 24 '16 at 09:53
  • Then, if I'm not mistaken, the answer is simply, "No." – Ryan Jun 24 '16 at 10:11
  • 1
    No, the behaviour you want (having objects in different modules but not needing to load from file again) is the default behavior. There are problem when modifying a file at runtime but these can be circumvented by using `reload`. So you can already do what you want to do. If you have a concrete problem (e.g. "I tried to reload module xyz but when I execute the function it still uses the old code") add some details. But without the any concrete featues you need the answer to your question is "python already does that". – syntonym Jun 24 '16 at 10:28
  • I've added an example scenario – Ryan Jun 24 '16 at 10:55
  • @Ryan I've taken your example into account. So what you had in mind (executing `python file2` which doesn't need to load p and q (without copying from a different process) is indeed not possible. Hopefully the approach I presented fits your usecase. Note that ipython/jupyter even has some magic commands to auto reload files which might fit your case. – syntonym Jun 24 '16 at 11:37
  • Thanks - your example worked nicely and the explanation was helpful – Ryan Jun 24 '16 at 13:39