4

I'm playing with torch and kalman filter, and I'm importing it in a bit of a non-standard way with importlib, more or less like this:

import importlib


class Foo():
    def __init__(self):
        pass

    def tst(self):
        torch = importlib.import_module("torch")
        kalman = importlib.import_module("filterpy.kalman")


foo = Foo()
foo.tst()

My program is not spawning any thread (at least I'm not spawning any thread myself and I checked the output of threading.enumerate() which contains only MainThread)

I'm facing a following problem:

Every now and again (approximately in 10% of tries) I'm observing RuntimeError thrown by scipy during that import:

  File "/opt/my_package/imp.py", line 492, in import_class
    imported_module = importlib.import_module("filterpy.kalman")
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/venv/lib/python3.8/site-packages/filterpy/kalman/__init__.py", line 22, in <module>
    from .EKF import *
  File "/opt/venv/lib/python3.8/site-packages/filterpy/kalman/EKF.py", line 28, in <module>
    from filterpy.stats import logpdf
  File "/opt/venv/lib/python3.8/site-packages/filterpy/stats/__init__.py", line 22, in <module>
    from .stats import *
  File "/opt/venv/lib/python3.8/site-packages/filterpy/stats/stats.py", line 32, in <module>
    from scipy.stats import norm, multivariate_normal
  File "/opt/venv/lib/python3.8/site-packages/scipy/stats/__init__.py", line 391, in <module>
    from .stats import *
  File "/opt/venv/lib/python3.8/site-packages/scipy/stats/stats.py", line 180, in <module>
    from . import distributions
  File "/opt/venv/lib/python3.8/site-packages/scipy/stats/distributions.py", line 12, in <module>
    from . import _discrete_distns
  File "/opt/venv/lib/python3.8/site-packages/scipy/stats/_discrete_distns.py", line 1266, in <module>
    pairs = list(globals().items())

Funny bit is that I get this error when running my dockerized code in one machine (AWS G4 instance), while not seeing it on another machine with same docker image. I see that on the machine where this issue happens it is much more probable to experience it if I invoke it straight after boot, the longer the machine is booted the less likely issue will manifest.

I found this closed ticket in scipy repository: https://github.com/scipy/scipy/issues/11479 - but the problem does not seem to be recognized (at least I can't see the proper explanation of issue).

There was another question asked on SO here, but it did not get any answer apart from useful comment about calls to list(): Bug in Python 3.5? list(globals().items()) caused RuntimeError: dictionary changed size during iteration

There was also this question on SO: Python: globals().items() iterations try to change a dict - however scipy does not iterate over globals().items() in a loop, they use list comprehensions. Also the problem is not easily reproducible - it does not manifest each time, from my observation it manifests randomly. Based on this question adding pairs = [] before calling pairs = list(globals().items()) would solve the problem, even though if that was the case then we should see this issue on each run.

I do understand that list(globals().items()) could be the case if my program was a multi-threaded program. But I only have MainThread, how is it possible that globals().items() changes while iterating over it?

Even though I'm running single-threaded code I tried to wrap my code like this (as suggested here: Forcing a thread to block all other threads from executing) but without luck:

original_interval = sys.getswitchinterval()
sys.setswitchinterval(1000)
# imports
sys.setswitchinterval(original_interval)

I also looked at disassembly of pairs = list(globals().items()) but since I have no experience in reading disassembly code I can't see any issue:

import dis
dis.dis("pairs = list(globals().items())")

  1           0 LOAD_NAME                0 (list)
              2 LOAD_NAME                1 (globals)
              4 CALL_FUNCTION            0
              6 LOAD_METHOD              2 (items)
              8 CALL_METHOD              0
             10 CALL_FUNCTION            1
             12 STORE_NAME               3 (pairs)
             14 LOAD_CONST               0 (None)
             16 RETURN_VALUE

I'm running on python 3.8.0 within nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04 container. Scipy version: 1.6.3. I'm not sure if this has anything to do but I also have numpy 1.20.3 and torch 1.7.1+cu110

Greg0ry
  • 931
  • 8
  • 25
  • Which version of SciPy are you using? You can check with `import scipy; print(scipy.__version__)` – Warren Weckesser Aug 24 '21 at 17:44
  • Ah, I forgot to post - it's `1.6.3` – Greg0ry Aug 24 '21 at 17:46
  • We had a similar problem for continuous distributions in the past, we may be able to use a similar fix for this. Do you have a MWE that we could use to test this issue? – Andrew Nelson Aug 24 '21 at 23:39
  • I'm still investigating the root cause of this, and it's not helping that I cannot reproduce it each time, it happens randomly. Looks like scipy team already provided a fix in their repository, though I'd love to understand why is this happening. – Greg0ry Aug 25 '21 at 10:09
  • @Greg0ry What happens if you import this module the "normal" way? I.e. just `import filterpy.kalman`; do you still observe this behavior? Also, did you report this problem to https://bugs.python.org/? Even if it's not a bug, there might be useful advice on why exactly this is happening. Actually I thought `list(d.items())` to be thread-safe since thread switching can only happen between bytecodes (?). – a_guest Aug 25 '21 at 13:20
  • Thanks @a_guest, I can't reproduce if I go for `import ...` option, however the system I'm working on requires me to use `importlib`.. I am trying to refine minimal working example where issue would be reproducible reliably outside of AWS G4 instance but it proves to be challenging. I also had same thoughts on `list(d.items())` but on another hand this is a single-threaded program and I am definitely not an expert when it comes to inner workings of cpython interpreter... – Greg0ry Aug 25 '21 at 13:48

0 Answers0