2

This pitfall is the number one hard to found bug since i work with python years ago.

Let's me show an oversimplified example, i have this files/dir:

[xiaobai@xiaobai import_pitfall]$ tree -F -C -a
.
├── import_all_pitall/
│   ├── hello.py
│   └── __init__.py
└── thread_test.py

1 directory, 3 files
[xiaobai@xiaobai import_pitfall]$

Content of thread_test.py:

[xiaobai@xiaobai import_pitfall]$ cat thread_test.py 
import time
import threading

def do_import1():
    print( "do_import 1A" )
    from import_all_pitall import hello
    print( "do_import 1B", id(hello), locals() )

def do_import2():
    print( "do_import 2A" )
    from import_all_pitall import hello as h
    print( "do_import 2B", id(h), locals() )

def do_import3():
    print( "do_import 3A" )
    import import_all_pitall.hello as h2
    #no problem if import different module #import urllib as h2
    print( "do_import 3B", id(h2), locals() )

print( "main 1" )
t = threading.Thread(target=do_import1)
print( "main 2" )
t.start()
print( "main 3" )
t2 = threading.Thread(target=do_import2)
print( "main 4" )
t2.start()
print( "main 5" )
print(globals()) #no such hello
#time.sleep(2) #slightly wait for do_import 1A import finished to test print hello below.
#print( "main 6", id(hello), locals() ) #"name 'hello' not defined" error even do_import1 was success
do_import3()
print( "main -1" )
[xiaobai@xiaobai import_pitfall]$

Content of hello.py:

[xiaobai@xiaobai import_pitfall]$ cat import_all_pitall/hello.py
print( "haha0" )
import time
t = time.time()
print( "haha1" )
def do_task():
    success = 0
    while not success:
        try:
            time.sleep(1)
            undefined_func( "Done haha" )
            success = 1
        except Exception as e:
            print("exception occur", e)
            print( "haha time is ", t )
do_task()
print( "haha -1" )
[xiaobai@xiaobai import_pitfall]$

While import_all_pitall/init.py is an empty file.

Let's run it:

[xiaobai@xiaobai import_pitfall]$ python thread_test.py 
main 1
main 2
do_import 1A
 main 3
haha0
haha1
main 4
do_import 2A
main 5
{'do_import1': <function do_import1 at 0x7f9d884760c8>, 'do_import3': <function do_import3 at 0x7f9d884a6758>, 'do_import2': <function do_import2 at 0x7f9d884a66e0>, '__builtins__': <module '__builtin__' (built-in)>, '__file__': 'thread_test.py', 't2': <Thread(Thread-2, started 140314429765376)>, '__package__': None, 'threading': <module 'threading' from '/usr/lib64/python2.7/threading.pyc'>, 't': <Thread(Thread-1, started 140314438158080)>, 'time': <module 'time' from '/usr/lib64/python2.7/lib-dynload/timemodule.so'>, '__name__': '__main__', '__doc__': None}
do_import 3A
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
^C('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
^C('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
^C^C('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
... #Forever

Look carefully, where does "do_import 2B" and "do_import 3B" ? It just hang on import instruction without even go to the first line of import because there's only one time.time() will be run. It hang just because the first import the same module on another thread/function in the "unfinished" loop state. My entire systems is big and multi-thread, super difficult to debug before i know the case.

After i comment out the '#undefined_func( "Done haha" )' in hello.py:

print( "haha0" )
import time
t = time.time()
print( "haha1" )
def do_task():
    success = 0
    while not success:
        try:
            time.sleep(1)
            #undefined_func( "Done haha" )
            success = 1
        except Exception as e:
            print("exception occur", e)
            print( "haha time is ", t )
do_task()
print( "haha -1" )

And run it:

[xiaobai@xiaobai import_pitfall]$ python3 thread_test.py 
main 1
main 2
do_import 1A
main 3
main 4
do_import 2A
main 5
{'do_import3': <function do_import3 at 0x7f31a462c048>, '__package__': None, 't2': <Thread(Thread-2, started 139851179529984)>, '__name__': '__main__', '__cached__': None, 'threading': <module 'threading' from '/usr/lib64/python3.4/threading.py'>, '__doc__': None, 'do_import2': <function do_import2 at 0x7f31ac1d56a8>, 'do_import1': <function do_import1 at 0x7f31ac2c0bf8>, '__spec__': None, 't': <Thread(Thread-1, started 139851187922688)>, '__file__': 'thread_test.py', 'time': <module 'time' from '/usr/lib64/python3.4/lib-dynload/time.cpython-34m.so'>, '__loader__': <_frozen_importlib.SourceFileLoader object at 0x7f31ac297048>, '__builtins__': <module 'builtins' (built-in)>}
do_import 3A
haha0
haha1
haha -1
do_import 1B 139851188124312 {'hello': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
do_import 2B 139851188124312 {'h': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
do_import 3B 139851188124312 {'h2': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
main -1
[xiaobai@xiaobai import_pitfall]$ 

I print the id and figure out they all share the same id 139851188124312. So 3 functions share the same import object/process. But this doesn't make sense to me, i thought object is local to the function, because if i try to print imported "hello" object on global scope, it will throw error:

Edit the thread_test.py to print hello object at global scope:

...
print( "main 5" )
print(globals()) #no such hello
time.sleep(2) #slightly wait for do_import 1A import finished to test print hello below.
print( "main 6", id(hello), locals() ) #"name 'hello' not defined" error even do_import1 was success
do_import3()
print( "main -1" )

Let's run it:

[xiaobai@xiaobai import_pitfall]$ python3 thread_test.py 
main 1
main 2
do_import 1A
main 3
main 4
do_import 2A
main 5
{'t': <Thread(Thread-1, started 140404878976768)>, '__spec__': None, 'time': <module 'time' from '/usr/lib64/python3.4/lib-dynload/time.cpython-34m.so'>, '__cached__': None, '__loader__': <_frozen_importlib.SourceFileLoader object at 0x7fb296b87048>, 'do_import2': <function do_import2 at 0x7fb296ac56a8>, 'do_import1': <function do_import1 at 0x7fb296bb0bf8>, '__doc__': None, '__file__': 'thread_test.py', 'do_import3': <function do_import3 at 0x7fb28ef19f28>, 't2': <Thread(Thread-2, started 140404870584064)>, '__name__': '__main__', '__package__': None, '__builtins__': <module 'builtins' (built-in)>, 'threading': <module 'threading' from '/usr/lib64/python3.4/threading.py'>}
haha0
haha1
haha -1
do_import 1B 140404879178392 {'hello': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
do_import 2B 140404879178392 {'h': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
Traceback (most recent call last):
  File "thread_test.py", line 31, in <module>
    print( "main 6", id(hello), locals() ) #"name 'hello' not defined" error even do_import1 was success
NameError: name 'hello' is not defined
[xiaobai@xiaobai import_pitfall]$ 

hello is not global, but why it can be share by different thread's in different functions ? Why python don't allow unique local import ? Why python share the import process, and it make all other threads just "wait" by no good reason just because one thread hang in the process of import ?

林果皞
  • 7,539
  • 3
  • 55
  • 70

2 Answers2

3

To answer one of the questions -

I print the id and figure they all share the same id 140589697897480. So 3 functions share the same import object/process.

Yes, when you import a module, python imports the module object and caches it in sys.modules . And then for any subsequent imports for that module, python takes the module object from sys.modules and returns that, it does not import again.

For the second part of that same question -

But this doesn't make sense to me, i though object is local to the function, because if i try to print imported "hello" object on global scope, it will throw error

Well, sys.modules is not local, but yes the name hello is local to the function. As stated above, if you try to import the module again, python will first lookup sys.modules to see if it has already been imported or not, and return if it contains the module , otherwise import it and add to sys.modules.


For the first program, when a python module is imported, it is run from the top level, in your hello.py you have an inifite loop - while 1: , since 1 is always true. So the import never finishes.

If you do not want the infinite loop to run, you should put the code that you do not want to run when importing a module inside -

if __name__ == '__main__':

The code inside the above if statement would only run , if the script is directly run, it will not run when the module is imported.


I am guessing when you say -

After i comment out the '#undefined_func( "Done haha" )' in hello.py

You actually commented out the complete infinite loop, and hence the importing was successful.

Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
  • Yes then, the infinte loop is removed , and hence importing works . My answer should answer your other questions . – Anand S Kumar Aug 13 '15 at 08:21
  • is thre any documentation warn about this ? i.e. unfinished import in one thread make other threads hang. – 林果皞 Aug 13 '15 at 08:50
  • 1
    I cannot find any documentation, but you can check out the answer for question - http://stackoverflow.com/questions/12389526/import-inside-of-a-python-thread - They clearly state that python imports are thread safe, so that means when importing a module it first acquires a lock, and if any other thread at the same time tries to import the module it would wait for that lock. – Anand S Kumar Aug 13 '15 at 09:00
1

I recommand you to print the threading.current_thread().name and name your threads in all your prints. It would be really easier to understand who did this action.

Look carefully, where does "do_import 2B" and "do_import 3B" ?

Python is currently loading the module, and the Python import processus is thread safe. It implies that two thread can not load the module in the same time. It's not about handling a time.time(), it's about having a lock on the file.

I print the id and figure they all share the same id 140589697897480

Yes, because Python load a module only one time. Considere your Python module as singleton.

Hello is not global, but why it can be share by different thread's in different functions ?

This is because hello is a local variable that points to a shared module. If as said before you considere your module as singleton, and then you considere that all the memory between threads in a same process is shared, then the singleton is shared with all threads.

As a lot of people said, it's not a bug, it's a feature :)


Here is another example. Let's considere 2 files : main.py which is the executed file, and other.py which is the imported file.

Here is main.py :

import threading
import logging
logging.basicConfig(level=logging.INFO)

def do_import_1():
    import other
    logging.info("I am %s and who did the import job ? %s", threading.current_thread().name, other.who_did_the_job.name)

def do_import_2():
    import other
    logging.info(other.who_did_the_job.name)
    logging.info("I am %s and who did the import job ? %s", threading.current_thread().name, other.who_did_the_job.name)

thread_import_1 = threading.Thread(target=do_import_1, name="Thread import 1")
thread_import_2 = threading.Thread(target=do_import_2, name="Thread import 2")

thread_import_1.start()
thread_import_2.start()

And here is other.py

import threading

who_did_the_job = threading.current_thread()
print "Thread loading the module : ", who_did_the_job.name

I used logging in order to avoid issue when 2 threads try to write in stdout at the same time. Here is the result I get (with python 2.7) :

Thread loading the module :  Thread import 1
INFO:root:I am Thread import 1 and who did the import job ? Thread import 1
INFO:root:Thread import 1
INFO:root:I am Thread import 2 and who did the import job ? Thread import 1

As you can see, the module is only imported one time.

FunkySayu
  • 7,641
  • 10
  • 38
  • 61
  • is thre any documentation warn about this ? i.e. unfinished import in one thread make other threads hang. – 林果皞 Aug 13 '15 at 08:50
  • 1
    Yes : https://docs.python.org/2/library/threading.html#importing-in-threaded-code – FunkySayu Aug 13 '15 at 08:51
  • Thanks. But the documentation is incredibly vague. It doesn't point out about sys.modules, "import same module" and "import process is shared". And not even exist on python3 documentation at https://docs.python.org/3/library/threading.html – 林果皞 Aug 13 '15 at 09:07