First of all we should all quickly review what threads are http://en.wikipedia.org/wiki/Thread_%28computer_science%29.
Ok, so threads share memory. So this should be easy! Which is also the good and bad thing about threads, it's easy and dangerous! (also lightweight for the OS).
Now, if using, python with cpython you should familiarize yourself with the global interpreter lock:
http://docs.python.org/glossary.html#term-global-interpreter-lock
Also, from http://docs.python.org/library/threading.html:
CPython implementation detail: Due to the Global Interpreter Lock, in
CPython only one thread can execute Python code at once (even though
certain performance-oriented libraries might overcome this
limitation). If you want your application to make better of use of the
computational resources of multi-core machines, you are advised to use
multiprocessing. However, threading is still an appropriate model if
you want to run multiple I/O-bound tasks simultaneously.
What does this mean? If your task isn't IO threading won't gain you anything from the OS since any time you do anything with python code, only a single thread will be able to do anything since it has the global lock and no other threads can get it. With IO bound tasks the OS will schedule other threads since the global lock will be released while waiting for the IO to complete. There is the caveat though that you could be calling into code that does not fall under the GIL and in that case threading will also perform well (hence the reference to "performance oriented libraries" above.)
Thankfully, python makes managing the shared memory a simple task and there is already good documentation on how to do so, though it took me a small bit to find it. If you have any further questions let us know.
In [83]: import _threading_local
In [84]: help(_threading_local)
Help on module _threading_local:
NAME
_threading_local - Thread-local objects.
FILE
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_threading_local.py
MODULE DOCS
http://docs.python.org/library/_threading_local
DESCRIPTION
(Note that this module provides a Python version of the threading.local
class. Depending on the version of Python you're using, there may be a
faster one available. You should always import the `local` class from
`threading`.)
Thread-local objects support the management of thread-local data.
If you have data that you want to be local to a thread, simply create
a thread-local object and use its attributes:
>>> mydata = local()
>>> mydata.number = 42
>>> mydata.number
42
You can also access the local-object's dictionary:
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]
What's important about thread-local objects is that their data are
local to a thread. If we access the data in a different thread:
>>> log = []
>>> def f():
... items = mydata.__dict__.items()
... items.sort()
... log.append(items)
... mydata.number = 11
... log.append(mydata.number)
>>> import threading
>>> thread = threading.Thread(target=f)
>>> thread.start()
>>> thread.join()
>>> log
[[], 11]
we get different data. Furthermore, changes made in the other thread
don't affect data seen in this thread:
>>> mydata.number
42
Of course, values you get from a local object, including a __dict__
attribute, are for whatever thread was current at the time the
attribute was read. For that reason, you generally don't want to save
these values across threads, as they apply only to the thread they
came from.
You can create custom local objects by subclassing the local class:
>>> class MyLocal(local):
... number = 2
... initialized = False
... def __init__(self, **kw):
... if self.initialized:
... raise SystemError('__init__ called too many times')
... self.initialized = True
... self.__dict__.update(kw)
... def squared(self):
... return self.number ** 2
This can be useful to support default values, methods and
initialization. Note that if you define an __init__ method, it will be
called each time the local object is used in a separate thread. This
is necessary to initialize each thread's dictionary.
Now if we create a local object:
>>> mydata = MyLocal(color='red')
Now we have a default number:
>>> mydata.number
2
an initial color:
>>> mydata.color
'red'
>>> del mydata.color
And a method that operates on the data:
>>> mydata.squared()
4
As before, we can access the data in a separate thread:
>>> log = []
>>> thread = threading.Thread(target=f)
>>> thread.start()
>>> thread.join()
>>> log
[[('color', 'red'), ('initialized', True)], 11]
without affecting this thread's data:
>>> mydata.number
2
>>> mydata.color
Traceback (most recent call last):
...
AttributeError: 'MyLocal' object has no attribute 'color'
Note that subclasses can define slots, but they are not thread
local. They are shared across threads:
>>> class MyLocal(local):
... __slots__ = 'number'
>>> mydata = MyLocal()
>>> mydata.number = 42
>>> mydata.color = 'red'
So, the separate thread:
>>> thread = threading.Thread(target=f)
>>> thread.start()
>>> thread.join()
affects what we see:
>>> mydata.number
11
>>> del mydata
And just in case... an example using your style above.
In [40]: class TestThread(threading.Thread):
...: report = list() #shared across threads
...: def __init__(self):
...: threading.Thread.__init__(self)
...: self.io_bound_variation = random.randint(1,100)
...: def run(self):
...: start = datetime.datetime.now()
...: print '%s - io_bound_variation - %s' % (self.name, self.io_bound_variation)
...: for _ in range(0, self.io_bound_variation):
...: with open(self.name, 'w') as f:
...: for i in range(10000):
...: f.write(str(i) + '\n')
...: print '%s - finished' % (self.name)
...: end = datetime.datetime.now()
...: print '%s took %s time' % (self.name, end - start)
...: self.report.append(end - start)
...:
And a run of three threads with output.
In [43]: threads = list()
...: for i in range(3):
...: t = TestThread()
...: t.start()
...: threads.append(t)
...:
...: for thread in threads:
...: thread.join()
...:
...: for thread in threads:
...: print thread.report
...:
Thread-28 - io_bound_variation - 76
Thread-29 - io_bound_variation - 83
Thread-30 - io_bound_variation - 80
Thread-28 - finished
Thread-28 took 0:00:08.173861 time
Thread-30 - finished
Thread-30 took 0:00:08.407255 time
Thread-29 - finished
Thread-29 took 0:00:08.491480 time
[datetime.timedelta(0, 5, 733093), datetime.timedelta(0, 6, 253811), datetime.timedelta(0, 6, 440410), datetime.timedelta(0, 4, 342053), datetime.timedelta(0, 5, 520407), datetime.timedelta(0, 5, 948238), datetime.timedelta(0, 8, 173861), datetime.timedelta(0, 8, 407255), datetime.timedelta(0, 8, 491480)]
[datetime.timedelta(0, 5, 733093), datetime.timedelta(0, 6, 253811), datetime.timedelta(0, 6, 440410), datetime.timedelta(0, 4, 342053), datetime.timedelta(0, 5, 520407), datetime.timedelta(0, 5, 948238), datetime.timedelta(0, 8, 173861), datetime.timedelta(0, 8, 407255), datetime.timedelta(0, 8, 491480)]
[datetime.timedelta(0, 5, 733093), datetime.timedelta(0, 6, 253811), datetime.timedelta(0, 6, 440410), datetime.timedelta(0, 4, 342053), datetime.timedelta(0, 5, 520407), datetime.timedelta(0, 5, 948238), datetime.timedelta(0, 8, 173861), datetime.timedelta(0, 8, 407255), datetime.timedelta(0, 8, 491480)]
You may wonder why report has more then three elements... that is because I ran the above for loop code three times in my interpreter. If I wanted to fix this "bug", I need to make sure to set the shared variable to an empty list before running.
TestThread.report = list()
Thus illustrates why threads can become unwieldy.