This pitfall is the number one hard to found bug since i work with python years ago.
Let's me show an oversimplified example, i have this files/dir:
[xiaobai@xiaobai import_pitfall]$ tree -F -C -a
.
├── import_all_pitall/
│ ├── hello.py
│ └── __init__.py
└── thread_test.py
1 directory, 3 files
[xiaobai@xiaobai import_pitfall]$
Content of thread_test.py:
[xiaobai@xiaobai import_pitfall]$ cat thread_test.py
import time
import threading
def do_import1():
print( "do_import 1A" )
from import_all_pitall import hello
print( "do_import 1B", id(hello), locals() )
def do_import2():
print( "do_import 2A" )
from import_all_pitall import hello as h
print( "do_import 2B", id(h), locals() )
def do_import3():
print( "do_import 3A" )
import import_all_pitall.hello as h2
#no problem if import different module #import urllib as h2
print( "do_import 3B", id(h2), locals() )
print( "main 1" )
t = threading.Thread(target=do_import1)
print( "main 2" )
t.start()
print( "main 3" )
t2 = threading.Thread(target=do_import2)
print( "main 4" )
t2.start()
print( "main 5" )
print(globals()) #no such hello
#time.sleep(2) #slightly wait for do_import 1A import finished to test print hello below.
#print( "main 6", id(hello), locals() ) #"name 'hello' not defined" error even do_import1 was success
do_import3()
print( "main -1" )
[xiaobai@xiaobai import_pitfall]$
Content of hello.py:
[xiaobai@xiaobai import_pitfall]$ cat import_all_pitall/hello.py
print( "haha0" )
import time
t = time.time()
print( "haha1" )
def do_task():
success = 0
while not success:
try:
time.sleep(1)
undefined_func( "Done haha" )
success = 1
except Exception as e:
print("exception occur", e)
print( "haha time is ", t )
do_task()
print( "haha -1" )
[xiaobai@xiaobai import_pitfall]$
While import_all_pitall/init.py is an empty file.
Let's run it:
[xiaobai@xiaobai import_pitfall]$ python thread_test.py
main 1
main 2
do_import 1A
main 3
haha0
haha1
main 4
do_import 2A
main 5
{'do_import1': <function do_import1 at 0x7f9d884760c8>, 'do_import3': <function do_import3 at 0x7f9d884a6758>, 'do_import2': <function do_import2 at 0x7f9d884a66e0>, '__builtins__': <module '__builtin__' (built-in)>, '__file__': 'thread_test.py', 't2': <Thread(Thread-2, started 140314429765376)>, '__package__': None, 'threading': <module 'threading' from '/usr/lib64/python2.7/threading.pyc'>, 't': <Thread(Thread-1, started 140314438158080)>, 'time': <module 'time' from '/usr/lib64/python2.7/lib-dynload/timemodule.so'>, '__name__': '__main__', '__doc__': None}
do_import 3A
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
^C('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
^C('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
^C^C('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
... #Forever
Look carefully, where does "do_import 2B" and "do_import 3B" ? It just hang on import instruction without even go to the first line of import because there's only one time.time() will be run. It hang just because the first import the same module on another thread/function in the "unfinished" loop state. My entire systems is big and multi-thread, super difficult to debug before i know the case.
After i comment out the '#undefined_func( "Done haha" )' in hello.py:
print( "haha0" )
import time
t = time.time()
print( "haha1" )
def do_task():
success = 0
while not success:
try:
time.sleep(1)
#undefined_func( "Done haha" )
success = 1
except Exception as e:
print("exception occur", e)
print( "haha time is ", t )
do_task()
print( "haha -1" )
And run it:
[xiaobai@xiaobai import_pitfall]$ python3 thread_test.py
main 1
main 2
do_import 1A
main 3
main 4
do_import 2A
main 5
{'do_import3': <function do_import3 at 0x7f31a462c048>, '__package__': None, 't2': <Thread(Thread-2, started 139851179529984)>, '__name__': '__main__', '__cached__': None, 'threading': <module 'threading' from '/usr/lib64/python3.4/threading.py'>, '__doc__': None, 'do_import2': <function do_import2 at 0x7f31ac1d56a8>, 'do_import1': <function do_import1 at 0x7f31ac2c0bf8>, '__spec__': None, 't': <Thread(Thread-1, started 139851187922688)>, '__file__': 'thread_test.py', 'time': <module 'time' from '/usr/lib64/python3.4/lib-dynload/time.cpython-34m.so'>, '__loader__': <_frozen_importlib.SourceFileLoader object at 0x7f31ac297048>, '__builtins__': <module 'builtins' (built-in)>}
do_import 3A
haha0
haha1
haha -1
do_import 1B 139851188124312 {'hello': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
do_import 2B 139851188124312 {'h': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
do_import 3B 139851188124312 {'h2': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
main -1
[xiaobai@xiaobai import_pitfall]$
I print the id and figure out they all share the same id 139851188124312. So 3 functions share the same import object/process. But this doesn't make sense to me, i thought object is local to the function, because if i try to print imported "hello" object on global scope, it will throw error:
Edit the thread_test.py to print hello object at global scope:
...
print( "main 5" )
print(globals()) #no such hello
time.sleep(2) #slightly wait for do_import 1A import finished to test print hello below.
print( "main 6", id(hello), locals() ) #"name 'hello' not defined" error even do_import1 was success
do_import3()
print( "main -1" )
Let's run it:
[xiaobai@xiaobai import_pitfall]$ python3 thread_test.py
main 1
main 2
do_import 1A
main 3
main 4
do_import 2A
main 5
{'t': <Thread(Thread-1, started 140404878976768)>, '__spec__': None, 'time': <module 'time' from '/usr/lib64/python3.4/lib-dynload/time.cpython-34m.so'>, '__cached__': None, '__loader__': <_frozen_importlib.SourceFileLoader object at 0x7fb296b87048>, 'do_import2': <function do_import2 at 0x7fb296ac56a8>, 'do_import1': <function do_import1 at 0x7fb296bb0bf8>, '__doc__': None, '__file__': 'thread_test.py', 'do_import3': <function do_import3 at 0x7fb28ef19f28>, 't2': <Thread(Thread-2, started 140404870584064)>, '__name__': '__main__', '__package__': None, '__builtins__': <module 'builtins' (built-in)>, 'threading': <module 'threading' from '/usr/lib64/python3.4/threading.py'>}
haha0
haha1
haha -1
do_import 1B 140404879178392 {'hello': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
do_import 2B 140404879178392 {'h': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
Traceback (most recent call last):
File "thread_test.py", line 31, in <module>
print( "main 6", id(hello), locals() ) #"name 'hello' not defined" error even do_import1 was success
NameError: name 'hello' is not defined
[xiaobai@xiaobai import_pitfall]$
hello is not global, but why it can be share by different thread's in different functions ? Why python don't allow unique local import ? Why python share the import process, and it make all other threads just "wait" by no good reason just because one thread hang in the process of import ?