9

Because sometimes it's more practical than designing a solution around queues, I would like to write a simple wrapper to make an iterator thread safe. So far, I had inspiration from these topics and came up with two ideas:

Idea 1

class LockedIterator(object):
    def __init__(self, it):
        self._lock = threading.Lock()
        self._it = it.__iter__()
        if hasattr(self._it, 'close'):
            def close(self):
                with self._lock:
                    self._it.close()
            self.__setattr__('close', close)

    def __iter__(self):
        return self

    def next(self):
        with self._lock:
            return self._it.next()

What I don't like about it, is that it gets a bit lengthy if I have to specify all possible methods - okay, I can't - such as the special case for generators. Also, I might have some other iterator with even more specific methods that have now become hidden.

Idea 2

class LockedIterator(object):
    def __init__(self, it):
        self._lock = threading.Lock()
        self._it = it.__iter__()

    def __getattr__(self, item):
        attr = getattr(self._it, item)
        if callable(attr):
            def hooked(*args, **kwargs):
                with self._lock:
                    return attr(*args, **kwargs)
            setattr(self, item, hooked)
            return hooked

This is more concise, but it can only intercept calls, and not, for example, direct property changes. (Those properties are now hidden to prevent problems.) More importantly, it makes it so that Python does no longer recognize my object as an iterator!

What is the best way of making this work for all iterators (or even better: all objects), without creating a leaky abstraction? I'm not too worried about locking when it's not necessary, but if you can come up with a solution that circumvents that, great!

Community
  • 1
  • 1
Thijs van Dien
  • 6,516
  • 1
  • 29
  • 48
  • Can't the thing being iterated still get mutated in between getting each bit with next? – GP89 Nov 19 '12 at 15:28
  • @GP89 I'm not sure what you're asking. The whole point of creating a locked iterator, is that I can use it safely among several threads, without having to work with queues. All these threads should be allowed to do anything with that iterator, except for adding/removing attributes, maybe. – Thijs van Dien Nov 19 '12 at 15:33
  • 1
    threading locks are themselves context managers, so you can simplify all the try-except-finally code down to just `with self._lock:` – PaulMcG Nov 19 '12 at 15:56
  • @PaulMcGuire Thanks! That cleans it up a bit. – Thijs van Dien Nov 19 '12 at 16:09
  • You could greatly improve the performance of Idea 2 by keeping a cache of hooked access functions already created. In fact, something similar to this [fast memoization decorator](http://code.activestate.com/recipes/578231-probably-the-fastest-memoization-decorator-in-the-/?in=lang-python) might be a good fit. – martineau Nov 19 '12 at 16:56
  • @martineau I did in fact play with the idea of setting the `hooked` method as an attribute of the class itself; I updated my question. Does your solution have anything to offer over that? – Thijs van Dien Nov 19 '12 at 17:05
  • 1
    I'm at a loss of trying to figure out when you'd need to share an iterator across threads, instead of sharing a concurrent collection. The whole design makes me feel uneasy - trying to avoid the need to use a proper queue seems like a very leaky abstraction already. – millimoose Nov 19 '12 at 19:13
  • @millimoose I can tell you about my particular use. I wrote a multithreaded tool to guess passwords. The passwords to try are provided by a generator. I would like to have the threads directly ask the generator for a next password to try. That way, I can also just use `close()` on the generator to stop further attempts if, for example, the correct one is found. Altogether, this simplifies a lot of things - no need for yet another thread that copies from a generator to a queue and puts sentinels when done. A generator is all I need, as long as it doesn't get messed up because of multithreading. – Thijs van Dien Nov 19 '12 at 19:22
  • My point was, locking for just for the `next` isn't thread safe. If the object is mutated between nexts, you'll get a `RuntimeError`. You need to lock while you iterate over the entire structure to prevent this. (for instance if you used your `LockedIterator` on a `list`/`deque`) (by the way, it sounds like you should be using a queue, you can stop them getting items by calling a stop method on the threads for instance) – GP89 Nov 19 '12 at 20:12
  • @GP89 The generator only changes when `next()` is called, and while that call is made, it is locked. Please give me an example of the kind of mutation you are talking about then, because I still don't see what you mean. – Thijs van Dien Nov 19 '12 at 20:16
  • In one thread `sum(1 for item in locked_iterator)`, assuming the iterator is an iterator of a list, and another thread appending items. This might not be a problem for your use case, but if you plan to re-use these locked iterators just be aware :) – GP89 Nov 19 '12 at 20:19
  • @GP89 I think such issues concern any shared (re)use of generators, regardless of threads. Anyway, thanks for the heads up. – Thijs van Dien Nov 19 '12 at 20:24

1 Answers1

7

First, are you aware of the GIL? Attempts to write multi-threaded Python typically end up in slower run-time than with a straightforward single-threaded version.

Your first attempt at making access to an iterator thread-safe seems quite reasonable. You can make it a bit more readable by using a generator:

def locked_iter(it):
    it = iter(it)
    lock = threading.Lock()
    while True:
        try:
            with lock:
                value = next(it)
        except StopIteration:
            return
        yield value
Thijs van Dien
  • 6,516
  • 1
  • 29
  • 48
user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • I am aware of the benefits and limitations of threads in Python. In my case, I have to deal with network latency. Your solution seems nice (and you deserve my upvote), but still it does not offer a way to keep any of the other attributes a particular kind of iterator might have. I decided not to ask my question in terms of one use case I have now, especially because I would like the solution to be more general. – Thijs van Dien Nov 19 '12 at 20:26
  • @tvdien If the iterator has other attributes, then it's not really an iterator, interface-wise — in that case we are talking about proxying arbitrary objects. This is possible in Python, but it requires tricky code to handle special methods—the only way to make it work is by creating classes on-the-fly. The result is slowish, hard to maintain, and rarely worth the effort. – user4815162342 Nov 19 '12 at 20:52
  • @tvdien I'm trying to understand the "network latency" requirement. How do Python threads help you with network latency? And what is wrong with using thread-safe queues, the standard multithreaded idiom? – user4815162342 Nov 19 '12 at 20:53
  • It is my understanding that any object having both `next` and `__iter__` methods can be considered an interator. As mentioned above, my current project is a password breaker. Since it works over SOAP, there is a considerable delay between request and response. While I'm waiting for one response, I can send the next request. The speed increased significantly. As to why I'd like to avoid queues, it too is described in the conversation above. Is it better to create a variable and not `yield next(it)`? – Thijs van Dien Nov 19 '12 at 21:02
  • 2
    @tvdien Existence of `__iter__` makes an object an *iterable*, capable of producing iterator(s). An *iterator* actually produces values, and as such only the `next` method. `yield next(it)` would be incorrect because it would inadvertently catch `StopIteration` raised where the generator is being used. – user4815162342 Nov 19 '12 at 21:23
  • @tvdien It is not at all clear from the conversation above why you are avoiding thread-safe queues. They are designed for exactly the kind of program that you are writing. – user4815162342 Nov 19 '12 at 21:25
  • Yet some objects refer to themselves with `__iter__`. You're saying that those should not have any other methods than `next`? The reason I'm avoiding a queue for input - I do use one for the results - is that it would introduce indirection and the need for quite some code that is rather meaningless, only to make it work. The generator is much more concise. – Thijs van Dien Nov 19 '12 at 21:30
  • I don't see the big deal with avoiding queues for this very simple case where I only take values. In the end, queues have locks as well... – Thijs van Dien Nov 19 '12 at 21:39
  • 1
    @user4815162342, given that Python is duck typed and based on [The Python Standard Library](http://docs.python.org/2/library/stdtypes.html#iterator-types) I think maybe you are taking the "is it an iterator" argument a little too literally. I would say that any object that supplies `next` and `__iter__` method as describe in the protocol meets the definition of an iterator. – JimP Nov 19 '12 at 21:51
  • @tvdien I'm not saying that they shouldn't have objects other than `next`, but that those are of no concern when discussing "wrapping iterators". If you are instead talking about wrapping of abitrary objects, that's a different matter. – user4815162342 Nov 19 '12 at 22:06
  • Alright then. My first concert was to wrap pure iterators, but to generalize my solution to any iterator, I would've liked it to work for any object, which turns out to be rather challenging. I'm willing to accept your answer, mostly because it points that out. I don't quite see how skipping the `value` variable and doing a `yield next(it)` would have any consequences for the `StopIteration` exception, though. An example of that would still be appreciated. In my own tests, both seem to work just fine. – Thijs van Dien Nov 19 '12 at 22:11
  • @tvdien An bigger problem with `yield next(it)` is that you're yielding from inside the `with` statement, which means that the lock held much longer than necessary—during the entire loop iteration instead of just during the iterator fetch. – user4815162342 Nov 19 '12 at 22:27
  • @tvdien You're raising a good point; I tried to come up with code to demonstrate the `StopIteration` issue, but was unable to do so. Apparently exceptions raised outside the generator aren't propagated into the generator, but converted into `GeneratorExit`, which is used to enable a `finally` block to run. Still, `yield` should almost always be outside of `with`—using `yield` inside `with` is almost never what you want. (In this case, it would cause the entire iteration of the *caller* of the generator to run with the lock held.) – user4815162342 Nov 19 '12 at 22:40
  • Actually that was the reason why I asked whether it was intentional. Thanks for all your input! – Thijs van Dien Nov 19 '12 at 22:42