5

I'm reading this question which asks if generators are thread-safe, and one answer said:

It's not thread-safe; simultaneous calls may interleave, and mess with the local variables.

Another answer shows that you can use a lock to ensure that only one thread uses the generator at a time.

I'm new to multithreading. Can anyone devise an example to show what exactly happens when you use the generator without lock?

For example, it doesn't seem to have any problems if I do this:

import threading

def generator():
    for i in data:
        yield i

class CountThread(threading.Thread):
    def __init__(self, name):
        threading.Thread.__init__(self)
        self.name = name

    def run(self):
        for i in gen():
            print '{0} {1}'.format(self.name, i)

data = [i for i in xrange(100)]
gen = generator()
a = CountThread('a')
b = CountThread('b')
a.start()
b.start()
Community
  • 1
  • 1
Haiyang
  • 1,489
  • 4
  • 15
  • 19
  • It's very hard to show an example of something not working due to threads, because you never know how much time a thread will get or what order they'll run in. It could be shear luck that nothing bad happens. – Alec Teal Nov 18 '13 at 08:48
  • 2
    Seriously, stop asking the same question over and over. (http://stackoverflow.com/questions/20042534/python-why-different-threads-get-their-own-series-of-values-from-one-generator) If you are not sure about something, please come and move this conversation to the [Python Chat Room](http://chat.stackoverflow.com/rooms/6/python). – Inbar Rose Nov 18 '13 at 08:49
  • @InbarRose Are they the same question? I'm asking what happens when it fails. – Haiyang Nov 18 '13 at 08:57
  • 1
    @InbarRose: This question is fine, it's different from what he asked in your linked question. – justhalf Nov 18 '13 at 09:05

1 Answers1

5

Run this example.

You'll see that the 10 000 numbers will be "shared" across threads. You won't see the 10 000 numbers in both threads.

It's actually most likely that one thread will see all the numbers.

import threading

class CountThread(threading.Thread):
  def __init__(self, gen):
      threading.Thread.__init__(self)
      self.gen = gen
      self.numbers_seen = 0

  def run(self):
      for i in self.gen:
          self.numbers_seen += 1


def generator(data):
    for _ in data:
        yield data

gen = generator(xrange(10000))

a = CountThread(gen)
b = CountThread(gen)

a.start()
b.start()

a.join()
b.join()

print "Numbers seen in a", a.numbers_seen
print "Numbers seen in b", b.numbers_seen

Actually, if it happens that Python switches threads during execution (just use a higher value than 10000, e.g. 10000000), you'll get an exception:

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 808, in __bootstrap_inner
    self.run()
  File "test.py", line 10, in run
    for i in self.gen:
ValueError: generator already executing
Thomas Orozco
  • 53,284
  • 11
  • 113
  • 116
  • Interesting that if you use the built-in `iter()` function instead of your `generator()` function -- as in `gen = iter(xrange(100000000))` -- it seems to always make it all the way thought without the exception. I'm not saying you're wrong -- perhaps the objects returned from `iter()` are thread-safe, although there's no mention of that in the documentation. – martineau Nov 18 '13 at 09:53
  • `iter()` isn't a generator, it's a completely custom iterator that maintains its' own state (as opposed to a generator which has to do this far more generically) – Nick Bastin Nov 18 '13 at 11:01
  • @Nick Bastin: [A generator is just a function which returns an iterator](http://docs.python.org/2/glossary.html#term-generator), so since the `iter()` built-in function [returns an iterator object](http://docs.python.org/2/library/functions.html?highlight=iter#iter), by definition it _is_ a generator. – martineau Nov 18 '13 at 20:28
  • @martineau: Not really - it may look like a duck (generator) and quack like a duck (generator), but is implemented completely differently internally, and thus doesn't have the same issues as a generator created wholly in python (as `iter()` is implemented wholly in C). A python generator (one created using `yield`) inherits thread safety issues inherent to the CPython implementation, but as `iter()` is implemented entirely in C it can avoid the problem of having to be threadsafe while calling *back* into your python code to get the next value. – Nick Bastin Nov 18 '13 at 20:53
  • @NickBastin: Yeah, I know it's because `iter()` is implemented in C. As I originally commented, I just thought it was interesting. But now I'm now wondering if there might be some way to inherit or otherwise leverage it's thread-safety to make one's own pure-python generators created with `yield` also that way... – martineau Nov 18 '13 at 21:00
  • I run your example code, but `a.numbers_seen+b.numbers_seen == 10000` always, although throw Exception `ValueError: generator already executing` sometimes, maybe is thread-safe when readonly. my python version: Python 2.7.11 [GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin – xavierskip Aug 01 '16 at 06:17
  • @xavierskip I'm not sure I understand your point here. My answer indicating that the numbers would be "shared" across both threads, and that it was likely that a thread would crash because multiple threads can't be "in" the generator at the same time. That's exactly what you're encountering, and it certainly doesn't make generators thread-safe, since their behavior (excetion or no exception) depends largely on thread scheduling. – Thomas Orozco Aug 01 '16 at 13:49
  • @ThomasOrozco python have the GIL, so I guess some operation is atomic, maybe when you just read a value from list, It's thread-safe. I add a line `print a.numbers_seen + b.numbers_seen` at last line when I run you example code, but also print 10000, neither high nor low, the element in iterator don't be count repeat or skip. – xavierskip Aug 01 '16 at 15:19
  • @xavierskip I'm sorry, I think you're misunderstanding a few things here (or I'm misunderstanding what you're saying); and comments aren't appropriate to discuss this. You should open a new question if you're actually trying to ask something here. – Thomas Orozco Aug 01 '16 at 15:21
  • oh, sorry for my poor English, but, if it's thread-safe what will be print with your code? – xavierskip Aug 02 '16 at 03:50
  • @xavierskip that really depends on your expectations, but not throwing an exception would be the least you'd hope for. – Thomas Orozco Aug 02 '16 at 15:01