65

Background

I once worked on a Python2 system that had a lot of custom I/O code written synchronously, and was scaled using threads. At some point, we couldn't scale it any further, and realised we have to switch to asynchronous programming.

  • Twisted was the popular choice, but we wanted to avoid its callback hell.
  • It did have the @inlineCallbacks decorator, which effectively implemented coroutines using generator magic, as did some other libraries. That was more tolerable, but felt a bit flaky.
  • And then we found gevent. All you had to do was:
from gevent import monkey
monkey.patch_all()

And just like that, all your standard I/O - sockets, database transactions, everything written in pure Python, really - was asynchronous, yielding and switching behind the scenes using greenlets.

It wasn't perfect:

  • Back then, it didn't work well on Windows (and it still has some limitations today). Luckily, we were running on Linux.
  • It couldn't monkey-patch C extensions, so we couldn't use MySQLdb, for example. Luckily, there were many pure Python alternatives, like PyMySQL.

Question

Nowadays, Python 3 is much more popular, and with it - asyncio. Personally, I think it's great, but I was recently asked in what ways does it differ from what we implemented with gevent, and couldn't come up with a good enough answer.

This might sound subjective, but I'm actually looking for real use-cases where one would significantly outperform the other, or allow something that the other does not. Here are the considerations I've gathered so far:

  1. Like I said, gevent is rather limited on Windows. Then again, most production code I know of runs on Linux.

    If you need to run on Windows, use asyncio.

  2. Gevent can't monkey-patch C extensions. But, asyncio can't monkey-patch anything.

    Imagine that a new DB technology comes up, and you'd like to use it, but there's isn't a pure Python library for it, so you can't integrate it with Gevent. The thing is, you're just as stuck when there isn't an io* library that you can integrate with asyncio! There are worker threads and executors, of course, but that's not the point, and works just as well in both cases anyway.

  3. Some people say it's a matter of personal taste, but I think it's fair to say that synchronous programming is inherently easier that asynchronous programming (think about it: have you ever met a novice programmer that can work with sockets, but has a hard time understanding how to properly select/poll them, or thinking in futures/promises? And have you ever met the reverse?).

    Anyway, let's not go there. I wanted to address this point because it comes up frequently (here's a discussion on reddit), but what I'm really after is scenarios where you have a practical reason to use one or the other.

  4. Asyncio is part of the standard library. That's huge: it means it's well maintained, well documented, and everybody knows about it and uses it by default.

    But, considering how little of Gevent you need to know to use it (and that it's pretty well maintained and documented as well), it doesn't seem as crucial. So while there are multiple answers on StackOverflow for even the most complicated scenarios involving futures, the possibility to not use futures at all seems just as viable.

Surely Guido and the Python community had a good reason to put so much effort into Asyncio, and even introduce new keywords into the languages - I just can't seem to find them.

What are the key differences between the two and in what scenarios do the become apparent?

ArcLight_Slavik
  • 113
  • 1
  • 10
Dan Gittik
  • 3,460
  • 3
  • 17
  • 24
  • 2
    I'm not convinced this is objective but I'm not going vote to close as "primarily opinion". I would suggest that you could cut a _lot_ of the chatty aspects to make this much more objective, though. – roganjosh Jan 18 '19 at 12:48
  • For example: "Sure, it's nice to have multiple answers on StackOverflow for even the most complicated scenarios involving futures - but it's nicer still to not have to use futures at all." how is that _not_ subjective? – roganjosh Jan 18 '19 at 12:50
  • Thanks for understanding! The reason I feel safe to say that "asynchronous programming is harder than synchronous programming" or that "futures are an advanced feature and it's nice to not have to use it" was actually encouraged by a question about concrete uses for metaclasses (https://stackoverflow.com/questions/392160/what-are-some-concrete-use-cases-for-metaclasses/31061875#31061875). Not unlike asyncio, it's a standard and useful feature in Python - but it's relatively fair to say that it's more advanced than the alternatives discussed. – Dan Gittik Jan 18 '19 at 12:55
  • 1
    Eep, ok. You've clearly given a very detailed answer there but also the standards of SO have changed _a lot_ since that question was asked. Still only my opinion, but I worry you'll face close votes on this unless you cut it back; of course, it's for others to decide and you whether you want to change the text. In any case, I'm not in a position to give you a suitable answer, though. – roganjosh Jan 18 '19 at 12:59
  • I appreciate your candour. I'll try to cut it back as much as I can. – Dan Gittik Jan 18 '19 at 13:03

1 Answers1

33

"Simple" answer from real-world usage:

  1. Good thing about gevent — you can patch things, which means that you [theoretically] can use synchronous libraries. I.e. you can patch django.
  2. Bad thing about gevent — not everything can be patched, if you must use some DB driver that can't be patched, you're doomed
  3. Worst thing about gevent — it's "magical". Amount of effort required to understand what happens with "patch_all" is enormous, the same effort applies to finding/hiring new people for your dev team. What is even worse — debugging gevent-based code is hell. I'd say, pretty much the same hell, as callbacks, if not worse.

Later point is key, I think. Most underestimated thing in software engineering is that code is meant to be read, not written or run effectively (if later is the case, you'd rather switch from python to system-level language). Asyncio came with missing part for async programming — pre-defined and controlled context switch points. You actually writing sync code (i.e. you're not thinking about sudden thread switch, locks, queues, etc.), and using await ... when you know call is IO blocking, so you let event loop pick on something else, that is ready for CPU, and pick up current state later.

This is what makes asyncio so good — it's easy to maintain. The downside is that pretty much all "world" must be async too — DB drivers, http tools, file handlers. And sometimes you'll be missing libraries, that's pretty much guaranteed.

Slam
  • 8,112
  • 1
  • 36
  • 44
  • 17
    I'm not sure I understand the comparison. You don't have to understand what `patch_all` does any more than you have to understand how `asyncio` is implemented; you just write straightforward, readable and maintainable synchronous code (and debug it as such), then magically make it asynchronous. What am I missing? – Dan Gittik Jan 18 '19 at 14:33
  • 2
    You can't debug patched code patched code in same way as sync code, that's the point. Running gevent-patched code with single-user load and, lets say, 100k is different — in terms of when and how context will be switched. Implementation of asyncio (which is hard, for sure) is different from understanding concept of event loop. Later is simple enough. – Slam Jan 18 '19 at 14:40
  • 1
    @DanGittik "you just write ... synchronous code ..., then magically make it asynchronous" - if you try to do it, you won't get any benefit from making code asynchronous. You'll have to run some tasks (coroutines) concurrently to get benefit ([example how gevent does it](http://www.gevent.org/intro.html#example)). And where appeared concurrency will sooner or later appear synchronization between coroutines, problems like deadlocks and many other async specific things. These are usually non-trivial things and `asyncio` allows you to read and debug them better than "magical" approach. – Mikhail Gerasimov Jan 18 '19 at 17:02
  • 4
    It's not the worst thing, it's how it should work. Everything is async in Golang, and it just works. In Python, there are multiple choices, and all of them are still broken in one way or another. Just compare subprocess and asyncio.subprocess to see how the standard library ships two ways of doing the same thing which don't have feature parity. That's what I call broken. – Kostja Nov 19 '19 at 11:48
  • The comparison with Golang is not quite right. Async was in Golang from the start, whereas the async was introduced into existing Python and is effectively a language fork. And its not a necessary one - it was possible to implement async in Python without as hard language fork - the language designers had a religious aversion to it. – user48956 Jun 10 '20 at 00:56
  • 1
    Not sure why everybody supposed to patch everything. If you know where are the IO is happening convert that to a greenlet. Then you have the most control. Just to download 1000s of URL you don't need to patch all. Just convert the code of downloading to a co-routine. – Shiplu Mokaddim Jun 14 '20 at 15:09
  • 2
    Yeah, no. All async does is add async / await everywhere and you're left with the exact same problem, trying to puzzle through the execution of coroutines. Having all the extra markup fluff doesn't make it more readable or understandable. Better would be to simply identify and document async entrypoints. – Blaze Oct 19 '20 at 15:19
  • Well, async/await is exactly "documenting async entrypoints", with bonus of not having any other context. switch, aside from awaitables – Slam Oct 19 '20 at 22:00
  • 1
    That's unfortunately not true and I don't get where that notion originated as I've read it in many places - you have to maintain a chain of colored functions in order to await deeper in the stack. – Blaze Oct 25 '20 at 22:22