156

I am new to gevents and greenlets. I found some good documentation on how to work with them, but none gave me justification on how and when I should use greenlets!

  • What are they really good at?
  • Is it a good idea to use them in a proxy server or not?
  • Why not threads?

What I am not sure about is how they can provide us with concurrency if they're basically co-routines.

Robert Siemer
  • 32,405
  • 11
  • 84
  • 94
Rsh
  • 7,214
  • 5
  • 36
  • 45
  • 1
    @Imran It's about greenthreads in Java. My question is about greenlet in Python. Am I missing something ? – Rsh Mar 21 '13 at 20:07
  • Afaik, threads in python are actually not really concurrent because of the global interpreter lock. So it would boil down to comparing overhead of both solutions. Although I understand that there are several implementations of python, so this may not apply for all of them. – didierc Mar 22 '13 at 22:17
  • 3
    @didierc CPython (and PyPy as of now) will not interpret Python (byte)code *in parallel* (that is, really physically at the same time on two distinct CPU cores). However, not everything a Python program does is under the GIL (common examples are syscalls including I/O and C functions that deliberately release the GIL), and a `threading.Thread` is actually an OS thread with all ramifications. So it's really not quite that simple. By the way, Jython has no GIL AFAIK and PyPy's trying to get rid of it too. –  Mar 22 '13 at 22:39

4 Answers4

225

Greenlets provide concurrency but not parallelism. Concurrency is when code can run independently of other code. Parallelism is the execution of concurrent code simultaneously. Parallelism is particularly useful when there's a lot of work to be done in userspace, and that's typically CPU-heavy stuff. Concurrency is useful for breaking apart problems, enabling different parts to be scheduled and managed more easily in parallel.

Greenlets really shine in network programming where interactions with one socket can occur independently of interactions with other sockets. This is a classic example of concurrency. Because each greenlet runs in its own context, you can continue to use synchronous APIs without threading. This is good because threads are very expensive in terms of virtual memory and kernel overhead, so the concurrency you can achieve with threads is significantly less. Additionally, threading in Python is more expensive and more limited than usual due to the GIL. Alternatives to concurrency are usually projects like Twisted, libevent, libuv, node.js etc, where all your code shares the same execution context, and register event handlers.

It's an excellent idea to use greenlets (with appropriate networking support such as through gevent) for writing a proxy, as your handling of requests are able to execute independently and should be written as such.

Greenlets provide concurrency for the reasons I gave earlier. Concurrency is not parallelism. By concealing event registration and performing scheduling for you on calls that would normally block the current thread, projects like gevent expose this concurrency without requiring change to an asynchronous API, and at significantly less cost to your system.

Community
  • 1
  • 1
Matt Joiner
  • 112,946
  • 110
  • 377
  • 526
  • 1
    Thanks, just two small questions : 1)Is it possible to combine this solution with multiprocessing to achieve higher throughput? 2)I still don't know why ever use threads? Can we consider them as a naive and basic implementation of concurrency in python standard library? – Rsh Mar 24 '13 at 08:23
  • 6
    1) Yes, absolutely. You shouldn't do this prematurely, but because of a whole bunch of factors beyond the scope of this question, having multiple processes serve requests will give you higher throughput. 2) OS threads are preemptively scheduled, and fully parallelized by default. They are the default in Python because Python exposes the native threading interface, and threads are the best supported and lowest common denominator for both parallelism and concurrency in modern operating systems. – Matt Joiner Mar 24 '13 at 11:09
  • 7
    I should mention that you shouldn't even be using greenlets until threads aren't satisfactory (usually this occurs because of the number of simultaneous connections you're handling, and either the thread count or the GIL are giving you grief), and even then only if there isn't some other option available to you. The Python standard library, and most third party libraries *expect* concurrency to be achieved through threads, so you may get strange behaviour if you provide that via greenlets. – Matt Joiner Mar 24 '13 at 11:11
  • @MattJoiner I have the below function which reads the huge file to calculate the md5 sum. how can i use gevent in this case to read faster `import hashlib def checksum_md5(filename): md5 = hashlib.md5() with open(filename,'rb') as f: for chunk in iter(lambda: f.read(8192), b''): md5.update(chunk) return md5.digest()` – Soumya Jul 25 '20 at 23:17
20

Correcting for @TemporalBeing 's answer above, greenlets are not "faster" than threads and it is an incorrect programming technique to spawn 60000 threads to solve a concurrency problem, a small pool of threads is instead appropriate. Here is a more reasonable comparison (from my reddit post in response to people citing this SO post).

import gevent
from gevent import socket as gsock
import socket as sock
import threading
from datetime import datetime


def timeit(fn, URLS):
    t1 = datetime.now()
    fn()
    t2 = datetime.now()
    print(
        "%s / %d hostnames, %s seconds" % (
            fn.__name__,
            len(URLS),
            (t2 - t1).total_seconds()
        )
    )


def run_gevent_without_a_timeout():
    ip_numbers = []

    def greenlet(domain_name):
        ip_numbers.append(gsock.gethostbyname(domain_name))

    jobs = [gevent.spawn(greenlet, domain_name) for domain_name in URLS]
    gevent.joinall(jobs)
    assert len(ip_numbers) == len(URLS)


def run_threads_correctly():
    ip_numbers = []

    def process():
        while queue:
            try:
                domain_name = queue.pop()
            except IndexError:
                pass
            else:
                ip_numbers.append(sock.gethostbyname(domain_name))

    threads = [threading.Thread(target=process) for i in range(50)]

    queue = list(URLS)
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    assert len(ip_numbers) == len(URLS)

URLS_base = ['www.google.com', 'www.example.com', 'www.python.org',
             'www.yahoo.com', 'www.ubc.ca', 'www.wikipedia.org']

for NUM in (5, 50, 500, 5000, 10000):
    URLS = []

    for _ in range(NUM):
        for url in URLS_base:
            URLS.append(url)

    print("--------------------")
    timeit(run_gevent_without_a_timeout, URLS)
    timeit(run_threads_correctly, URLS)

Here are some results:

--------------------
run_gevent_without_a_timeout / 30 hostnames, 0.044888 seconds
run_threads_correctly / 30 hostnames, 0.019389 seconds
--------------------
run_gevent_without_a_timeout / 300 hostnames, 0.186045 seconds
run_threads_correctly / 300 hostnames, 0.153808 seconds
--------------------
run_gevent_without_a_timeout / 3000 hostnames, 1.834089 seconds
run_threads_correctly / 3000 hostnames, 1.569523 seconds
--------------------
run_gevent_without_a_timeout / 30000 hostnames, 19.030259 seconds
run_threads_correctly / 30000 hostnames, 15.163603 seconds
--------------------
run_gevent_without_a_timeout / 60000 hostnames, 35.770358 seconds
run_threads_correctly / 60000 hostnames, 29.864083 seconds

the misunderstanding everyone has about non-blocking IO with Python is the belief that the Python interpreter can attend to the work of retrieving results from sockets at a large scale faster than the network connections themselves can return IO. While this is certainly true in some cases, it is not true nearly as often as people think, because the Python interpreter is really, really slow. In my blog post here, I illustrate some graphical profiles that show that for even very simple things, if you are dealing with crisp and fast network access to things like databases or DNS servers, those services can come back a lot faster than the Python code can attend to many thousands of those connections.

zzzeek
  • 72,307
  • 23
  • 193
  • 185
  • I changed your code slightly to do an actual `socket.connect` operation rather than simply resolving the hostname to IP, and the greenlet code was much faster (about 3x faster/10000 socket connections per second). I had to use `gevent.pool.Pool(50)` to restrict the number of open sockets when using greenlets. I think a test using socket.connect is a more realistic usage of greenlets when it comes to network programming, than one that simply resolves the hostname. – smac89 Sep 12 '22 at 14:44
16

Taking @Max's answer and adding some relevance to it for scaling, you can see the difference. I achieved this by changing the URLs to be filled as follows:

URLS_base = ['www.google.com', 'www.example.com', 'www.python.org', 'www.yahoo.com', 'www.ubc.ca', 'www.wikipedia.org']
URLS = []
for _ in range(10000):
    for url in URLS_base:
        URLS.append(url)

I had to drop out the multiprocess version as it fell before I had 500; but at 10,000 iterations:

Using gevent it took: 3.756914
-----------
Using multi-threading it took: 15.797028

So you can see there is some significant difference in I/O using gevent

TemporalBeing
  • 311
  • 2
  • 6
  • 12
    it is entirely incorrect to spawn 60000 native threads or processes to complete the work and this test shows nothing (also did you take the timeout off of the gevent.joinall() call?). Try using a thread pool of about 50 threads, see my answer: https://stackoverflow.com/a/51932442/34549 – zzzeek Aug 20 '18 at 14:14
7

This is interesting enough to analyze. Here is a code to compare performance of greenlets versus multiprocessing pool versus multi-threading:

import gevent
from gevent import socket as gsock
import socket as sock
from multiprocessing import Pool
from threading import Thread
from datetime import datetime

class IpGetter(Thread):
    def __init__(self, domain):
        Thread.__init__(self)
        self.domain = domain
    def run(self):
        self.ip = sock.gethostbyname(self.domain)

if __name__ == "__main__":
    URLS = ['www.google.com', 'www.example.com', 'www.python.org', 'www.yahoo.com', 'www.ubc.ca', 'www.wikipedia.org']
    t1 = datetime.now()
    jobs = [gevent.spawn(gsock.gethostbyname, url) for url in URLS]
    gevent.joinall(jobs, timeout=2)
    t2 = datetime.now()
    print "Using gevent it took: %s" % (t2-t1).total_seconds()
    print "-----------"
    t1 = datetime.now()
    pool = Pool(len(URLS))
    results = pool.map(sock.gethostbyname, URLS)
    t2 = datetime.now()
    pool.close()
    print "Using multiprocessing it took: %s" % (t2-t1).total_seconds()
    print "-----------"
    t1 = datetime.now()
    threads = []
    for url in URLS:
        t = IpGetter(url)
        t.start()
        threads.append(t)
    for t in threads:
        t.join()
    t2 = datetime.now()
    print "Using multi-threading it took: %s" % (t2-t1).total_seconds()

here are the results:

Using gevent it took: 0.083758
-----------
Using multiprocessing it took: 0.023633
-----------
Using multi-threading it took: 0.008327

I think that greenlet claims that it is not bound by GIL unlike the multithreading library. Moreover, Greenlet doc says that it is meant for network operations. For a network intensive operation, thread-switching is fine and you can see that the multithreading approach is pretty fast. Also it's always prefeerable to use python's official libraries; I tried installing greenlet on windows and encountered a dll dependency problem so I ran this test on a linux vm. Alway try to write a code with the hope that it runs on any machine.

max
  • 9,708
  • 15
  • 89
  • 144
  • 25
    Note that ``getsockbyname`` caches the results at the OS level (at least on my machine it does). When invoked on a previously unknown or expired DNS it will actually perform a network query, which might take some time. When invoked on a hostname that has just recently been resolved it will return the answer much faster. Consequently, your measurement methodology is flawed here. This explains your strange results - gevent cannot really be that much worse than multithreading - both are not-really-parallel at the VM level. – KT. Apr 19 '15 at 15:21
  • 1
    @KT. that is an excellent point. You would need to run that test many times and take means, modes and medians to get a good picture. Note also that routers cache route paths for protocols and where they don't cache route paths you could get different lag from different dns route path traffic. And dns servers heavily cache. It might be better to measure threading using time.clock() where cpu cycles are used instead of being effected by latency over network hardware. This could eliminate other OS services sneaking in and adding time from your measurements. – DevPlayer Oct 13 '16 at 01:27
  • Oh and you can run a dns flush at the OS level between those three tests but again that would only reduce false data from local dns caching. – DevPlayer Oct 13 '16 at 01:29
  • Yup. Running this cleaned up version: https://paste.ubuntu.com/p/pg3KTzT2FG/ I get pretty much identical-ish times... `using_gevent() 421.442985535ms using_multiprocessing() 394.540071487ms using_multithreading() 402.48298645ms` – sehe May 14 '18 at 23:37
  • I think OSX is doing dns caching but on Linux it's not a "default" thing: https://stackoverflow.com/a/11021207/34549 , so yes, at low levels of concurrency greenlets are that much worse due to interpreter overhead – zzzeek Aug 20 '18 at 14:06