3

I am trying to build a program in python that relies on multiple threads, with data shared between the threads. I am trying to avoid doing this with the global keyword, but not getting anywhere so far.

As a simple example (code below), my main() function spawns one thread, thread1, which should be able to access the variable count, in this case just to print it. At the same time, main() iterates this variable, and thread1 should be able to see count changing. Some self contained code here:

import threading
import time

class myThread (threading.Thread):

    def __init__(self, threadID):
        self.threadID = threadID
        threading.Thread.__init__(self)

    def run(self):
        global count
        for i in range(10):
            print "count is now: ", count, " for thread ", self.threadID, "\n"
            time.sleep(5)


def main():

    global count
    count = 0

    # spawn one or more threads
    thread1 = myThread(1)
    thread1.start()

    for i in range(20):
        time.sleep(2)
        count = count + 1

    # wait for thread1 to finish
    thread1.join()

main()

When reading about threads in python, I haven't found any other ways to do this than using global. However, when reading about global, most people say you should very rarely use it, for good reasons, and some people even think it should be removed from python altogether. So I am wondering if there is actually an alternative way of getting thread1 to "passively" detect that main() has iterated count, and access that new value? E.g. I don't know much about python and pointers (do they even exist in python?), but I would in any case assume this is exactly what global achieves.

Ideally I would be able to call a thread1 method from main() to set a new self.count whenever count is iterated, but as thread1 has a run() method that is blocking I can't see how to do this without having another independent thread inside thread1, which seems too complicated.

Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
funklute
  • 581
  • 4
  • 21
  • 1
    Use a `threading.Lock` to control access to the global. To avoid a global derive a `threading.Thread` counter subclass that acquires and releases the its own private Lock attribute as necessary to prevent two processes from accessing the internal counter value at the same time. – martineau Dec 04 '12 at 10:01
  • Actually there's no reason to derive the counter subclass from `threading.Thread` for your problem. All it needs is to be able to be used by more than one thread at a time, which it can do with its Lock member variable. – martineau Dec 04 '12 at 10:14
  • Good point @martineau , just for future readers I decided to not use locks here mainly because I am only changing the variable from main(), and changing the variable in question is an atomic operation, i.e. thread1 will never read a halfway-incremented count. (I'm not an expert though, so correct my if I'm wrong) – funklute Dec 04 '12 at 11:30
  • Incrementing the counter is not an atomic operation. The thread doing to has to first read in the current value, increment it, and then store it back. Other threads could in theory read the value at any of these stages. This could be a problem for example if it was being used as an index into another data structure. – martineau Dec 04 '12 at 11:41

1 Answers1

2

You can create the thread object and fill the class attributes.

import threading
class MyThreadClass(threading.Thread):
  def __init__(self, fruits):
    threading.Thread.__init__(self)
    self.fruits = fruits    

  def run(self):
    self.fruits.append('banana')  

list_fruit = ['apple', 'orange']    
print 'BEFORE:', list_fruit
thread = MyThreadClass(list_fruit)
thread.start() 
thread.join() #Wait for the thread to finish
print 'AFTER:', list_fruit

Output:

BEFORE: ['apple', 'orange']
AFTER: ['apple', 'orange', 'banana']

For your case, you can try:

import threading
import time

class myThread (threading.Thread):

    def __init__(self, threadID):
        self.threadID = threadID
        self.count = 0
        threading.Thread.__init__(self)

    def run(self):
        for i in range(10):
            print "count is now: ", self.count, " for thread ", self.threadID, "\n"
            time.sleep(5)


def main():    
    # spawn one or more threads
    thread1 = myThread(1)
    thread1.start()

    for i in range(20):
        time.sleep(2)
        thread1.count = thread1.count + 1

    # wait for thread1 to finish
    thread1.join()
    print thread1.count

main()

If you want to use the same count shared between multiple thread, you can put your count in an list containing just one element. This way, when assigning count to thread attribute, it will not be a hard copy.

Omar
  • 101
  • 6
  • Although this gets rid if the Global, in general it's not a good pattern to follow because it has two or more threads accessing shared data without some kind of locking mechanism in place. In the simple example it does little harm only because the neither thread does anything very important with the data, but in realistic use cases the fact that access is not coordinated could be harmful. Also the part at the end about putting the count in list makes no sense. – martineau Dec 04 '12 at 19:19
  • @Omar I guess you mean "GIL" ([Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock)). But you should use a lock anyway, [see](http://stackoverflow.com/a/1718843/2291710). – Delgan Apr 13 '16 at 11:22
  • In case anyone wonders if you can pass a **queue** to communicate with the threads: giving `__init__(self, queue)` the queue, you can then `self.queue = queue` in the 'init', so that you can use the queue in the run() method. Despite being flagged as 'self', all threads, as well as the method ouside the class where you created the class instance with, share the same queue afterwards. – user136036 Feb 14 '18 at 14:53