1

I've read all the documentation on the subject, but it seems I can't grasp the whole concept of Python coroutines well enough to implement what I want to do.

I have a background task (which generates some random files, but that doesn't much matter), and it does this in an infinite loop (this is a watcher).

I would like to implement this background task in the most efficient way possible, and I thought that microthreads (aka coroutines) were a good way to achieve that, but I can't get it to work at all (either it the background task runs or either the rest of the program, but not both at the same time!).

Could someone give me a simple example of a background task implemented using coroutines? Or am I being mistaken in thinking that coroutines could be used for that purpose?

I am using Python 2.7 native coroutines.

I am well versed into concurrency, particularly with DBMSes and Ada, so I know a lot about the underlying principles, but I'm not used to the generator-as-coroutines concept which is very new to me.

/EDIT: here is a sample of my code, which I must emphasize again is not working:

@coroutine
def someroutine():
    with open('test.txt', 'a') as f:
        f.write('A')
    while True:
        pass
    yield 0

@coroutine
def spawnCoroutine():
    result = yield someroutine()

    yield result

routine = spawnCoroutine()
print 'I am working in parallel!'

# Save 'A' in the file test.txt, but does not output 'I am working in parallel!'

Note: @coroutine is a decorator from coroutine.py provided by David Beazley

/FINAL EDIT AND SOLUTION RECAP

Ok my question was closed because it was seemingly ambiguous, which as a matter of fact is the very purpose of my question: to clarify the usage of Coroutines over Threading and Multiprocessing.

Luckily, a nice answer was submitted before the dreadly sanction occurred!

To emphasize the answer to the above question: no, Python's coroutines (nor bluelet/greenlet) can't be used to run an independent, potentially infinite CPU-bound task, because there is no parallelism with coroutines.

This is what confused me the most. Indeed, parallelism is a subset of concurrency, and thus it is rather confusing that the current implementation of coroutines in Python allow for concurrent tasks, but not for parallel tasks! This behaviour is to be clearly differentiated with the Tasks concept of concurrent programming languages such as Ada.

Also, Python's Threads are similar to coroutines in the fact that they generally switch context when waiting for I/O, and thus are also not a good candidate for independent CPU-bound tasks (see David Beazley's Understanding the GIL).

The solution I'm currently using is to spawn subprocesses with the multiprocessing module. Spawning background processes is heavy, but it's better than running nothing at all. This also has the advantage to allow for distributing computation.

As an alternative, on Google App Engine, there are the deferred module and the background_thread module which can offer interesting alternatives to multiprocessing (for example by using some of the libraries that implement the Google App Engine API like typhoonae, although I'm not sure they have yet implemented these modules).

gaborous
  • 15,832
  • 10
  • 83
  • 102
  • 5
    Where's your code? (It should be here). – Marcin Nov 14 '12 at 19:56
  • 1
    Are you doing coroutines on top of Python 2.x generators, 3.x generators, stackless (or PyPy) native coroutines, or something different? If 2.x, have you worked through http://www.dabeaz.com/coroutines/index.html? There's tons of code samples, and of course the whole thing is designed to get you to grasp the concept. – abarnert Nov 14 '12 at 19:57
  • 1
    Coroutines alone aren't sufficient for this. However, they lend themselves well to building cooperative multitasking. But seeing as you appear quite confused about concurrency, you probably shouldn't try to build this yourself -- it's hard enough already. –  Nov 14 '12 at 19:57
  • I added a bit more informations, my bad to forget to specify my Python version and tools used (no framework in fact). Also, I did read the sourcecodes and watch the presentation, but it seems coroutines are mainly used to process I/O asynchronously. So maybe what I want to do is impossible, but I thought it should be, hence why I am asking. @delnan: I know a lot about the underlying principles of concurrency but just not about generators-as-coroutines, hence why I must confess that indeed I'm a lot confused about it. – gaborous Nov 14 '12 at 20:01
  • Show us some sort of code. The only reference to coroutines in the Python 2.7 docs I could find was some offhand reference to them in the discussion of yield expressions. So I have no idea what you mean by "Python 2.7 native coroutines". – John Gaines Jr. Nov 14 '12 at 20:14
  • 2
    @JohnGainesJr. The term is common at least on the `python-*` mailing lists, and refers to "generators" that communicate by using the `res = yield foo` (and now, `res = yield from foo`) constructs. The term also dates back to the original PEP which introduced these features. –  Nov 14 '12 at 20:16
  • 2
    @user1121352 Concurrent is not exactly the same as parallel, which is what you seem to be asking for. Coroutines based on `yield/next()/send()` aren't by their own parallel unless you mix them with threads or greenlets. – lqc Nov 14 '12 at 20:17
  • You may be interested in the threads (about two dozen by now) on async APIs on the `python-ideas` mailing list. Several designs revolve around using coroutines to implement cooperative multi-tasking, there's some code implementing a scheduler, various models of communication are discussed. It starts with "asyncore: included batteries don't fit" in early October (IIRC the topic existed earlier, but that's the earliest mail I could find in my inbox) and goes on for a long time. –  Nov 14 '12 at 20:23
  • 2
    For this particular task you probably should not use co-routines, but real threads. – Hans Then Nov 14 '12 at 20:34
  • … and if you _do_ have a good reason to use cooperative threads, you probably should use a cooperative thread library, rather than building it yourself on directly on top of generators. – abarnert Nov 14 '12 at 21:28

2 Answers2

3

If you look at the (trivial) coroutine.py library you're using, it includes an example that shows how grep works "in the background". There are two differences between your code and the example:

  1. grep repeatedly yields while doing its work—in fact, it yields once per line. You have to do this, or nobody but your coroutine gets a chance to run until it's finished.

  2. the main code repeatedly calls send on the grep coroutine, again once per line. You have to do this, or your coroutines never get called.

This is about as trivial a case as possible—a single coroutine, and a trivial dispatcher that just unconditionally drives that one coroutine.

Here's how you could translate your example into something that works:

@coroutine
def someroutine():
    with open('test.txt', 'a') as f:
        yield
        f.write('A')
    while True:
        yield
    yield 0

routine = someroutine()
print 'I am working in parallel!'
routine.send()
print 'But only cooperatively...'
routine.send()

And so on.

But normally you don't want to do this. In the case of the grep example, the coroutine and the main driver are explicitly cooperating as a consumer and producer, so that direct coupling makes perfect sense. You just have some completely independent tasks that you want to schedule independently.

To do that, don't try to build threading yourself. If you want cooperative threading, use an off-the-shelf dispatcher/scheduler, and the only change you have to make to all of your tasks is to put in yield calls often enough to share time effectively.

If you don't even care about the threading being cooperative, just use threading or multiprocessing, and you don't even need the yields:

def someroutine():
    with open('test.txt', 'a') as f:
        f.write('A')
    while True:
        pass
    return 0

routine = threading.Thread(someroutine)
print 'I am working in parallel!'

PS, as I said in one of the comments, if you haven't worked through http://www.dabeaz.com/coroutines/index.html or an equivalent, you really should do that, and come back with any questions you find along the way, instead of writing code that you don't understand and asking why it doesn't work. I'm willing to bet that if you make it to part 4 (probably even earlier), you'll see why your initial question was silly.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Complete and precise answer. Indeed my task is completely independent, and I thought that Python coroutines, which are also described as Tasks (even in the David Beazley talk!), were working the same as Tasks in other concurrent programming languages such as Ada. Oh boy, was I mistaken! In fact it's only concurrent programming, which switches context when an I/O is performed, a bit like current Python threads release GIL on I/O. Also this [post helped me to understand the difference between IO bound tasks and CPU bound tasks](http://stackoverflow.com/a/8994790/1121352). – gaborous Nov 15 '12 at 15:15
  • Thank's to your explanations, I will be using the `multiprocessing` module which is a lot more appropriate for my case (a fully independent CPU bound task). – gaborous Nov 15 '12 at 15:16
  • 1
    One last comment I would like to add: in fact Python's coroutines and Bluelets/Greenlets are used to achieve **concurrency without parallelism**. Everything is in fact done sequentially (we just switch context when a coroutine waits for I/O). Indeed this simplifies a lot most of the concurrency code for event driven application such as GUI or web apps, since there's no messy parallelism to mess with. But it is limitating when you really need parallelism. Discovering that fact was the most enlightening to me. – gaborous Nov 15 '12 at 16:21
  • @user1121352: Most people (especially people raised on the Microsoft-/Java-style "multithread everything possible" style) don't appreciate the distinction, or why it's useful to make it. (And once you get the issues, you can see why you may want to mix and match—e.g., to run 5000 long-lived tasks concurrently while making efficient use of 8 cores, you might run greenlets over a thread or process pool…). – abarnert Nov 16 '12 at 20:48
1

while True: pass

ENDLESS LOOP.

So it will not execute yeld after that. In fact its real end of function, everything after that is pure useless decoration.

And since someroutine get STUCK before it can yeld (pun intended ;) ), yeld someroutine() will also not yeld.

So you get your script busily doing nothing. (infinite empty loop).

przemo_li
  • 3,932
  • 4
  • 35
  • 60
  • Yes that's an example, inside the loop I can be doing other stuffs, but the goal is that this loop does not stop the rest of the script. So I guess that's not possible using coroutines? – gaborous Nov 14 '12 at 21:09
  • No no no!!!! If you do infinite loop "yeld" will not be executed!!! Move yeld INTO loop. – przemo_li Nov 26 '12 at 22:51