2

I have a python codebase I'm using for research. The codebase imports libraries like numpy and pytorch, but also some custom tooling (i.e. other python packages I've written and want to use). These custom packages are installed using pip install -e . into a virtual environment.

My workflow is such that I will launch a long-running job (a week or so) and then continue to modify or refactor the codebase in parallel. I'm becoming increasingly suspicious (paranoid?) that some (not all) of these modifications are changing runtime behavior.

Unfortunately, I have not been able to isolate this into a concrete example. Instead, I feel like python is gaslighting me with unexplainable results.

Is there something with python's garbage collector and working with editable installs where some modules are reloaded? Or maybe all modules are not loaded upfront?

I have explicitly seen this behavior when making large changes: Such as

 - run "experiment" script
    - "experiment" script imports package 'tools'
 - While experiment script is running, update `tools` from v1.0 to v2.0
 - Assert checker in "experiment" script that checks 
   tools.__version__ == v1.0 causes code to crash

Ben
  • 6,986
  • 6
  • 44
  • 71
  • [possibly related question](https://stackoverflow.com/questions/19077381/what-happens-when-a-module-is-imported-twice) – timgeb Nov 12 '20 at 16:05

1 Answers1

1

I think your bold statement already has the answer in it.

What might be happening is that your editable flag "-e" could cause some issues. But as far as I understand it this should not happen during runtime. However, if the python kernel is restarted in the script somewhere it would lead to issues.

A solution would then be to not use the -e flag when installing the packages in the virtual environment where your script runs (e.g. python setup.py install). This is also preferable to do if you do not want it to implicitly update the versions it is running on.

  • Can you expand your answer please? what type of things would cause the python kernel to restart during runtime? – Ben Nov 12 '20 at 16:35
  • 1
    Not sure, could be many things. For example, your script spawns a set of python processes, if a new one is spawned after the update it will use the new version. But why would you want to use -e for this script's environment in the first place? – Thom Marchesini Nov 12 '20 at 16:42
  • Based on your description I'm not sure a normal pip install would help? Let's say I launch experiment 1. and then for experiment 2 I want to test a new change in `tools` (which I want to launch in parallel to exp1) , I'll still be forced to re-pip install `tools`, applying those changes to the environment – Ben Nov 12 '20 at 16:48
  • How about separating those environments? – Thom Marchesini Nov 12 '20 at 16:52
  • 1
    I guess docker would be the safe bet – Ben Nov 12 '20 at 16:58