GEvent / GUnicorn and the C10k issue

Question

The C10K problem tells us about conventional web servers having at best a capacity of ~10k simoultaneous restrictions.

Servers like nginx use a single-threaded model and asynchronous communication instead of threads to handle the incoming requests. AFAIK Gevent uses greenlets (switchable execution contexts inside the same thread) instead of threads.

This leads me to two questions (again: assume we're in an asynchronous model - think in gevent and gunicorn):

Under those circumstances: Does a risk of resource-hogging exist? For greenlet-based servers I'll restrict the question furthermore: assume a resource-hogging is actually a mutex-lock (a mutex lock blocks the current thread, althought not the current process; but now we're not in a multithreaded architecture anymore if we use greenlets ... am I wrong?).
If we're not in a greenlet-based architecture (nor a threaded one): How are the Websockets implemented in the server?

And an additional question goes for Django:

How do I identify the current request when I'm not inside a view and cannot directly reach the view parameters? I had the bad practice of identifying the current thread using a threading.local (which was populated inside a custom Middleware) but in that time I did not consider non-threaded architectures (my code was fine as long as I could say "one request (implies) one thread").

This would help me in a scenario: Identifying the current request when a form called a (mine/custom) field's clean() method (i.e. validating the value against data depending on the current request). However this method would fail if I have simoultaneous requests surpassing the 10k limit and using an asynchronous (nonthreaded) approach.

You should ask one question in one post. This is not a discussion forum. You are asking for help with your solutions but you haven't explained your problem. — Burhan Khalid, Jul 18 '14 at 15:11

Luis Masuelli · Answer 1 · 2014-07-18T15:35:35.597

(EDIT - gevent.monkey.patch_all() - run at the wsgy.py script file - automatically patches the threadlocals to become greenlet locals, so this alternative using Werkzeug is not needed for GEvent (or GUnicorn with Gevent workers) - if, somehow, you use greenlets without GEvent, you may need this solution)

I found myself an answer when remembering the Flask Framework:

Flask is a Framework which supports "globally" many objects like session and request which "look" like threading.local objects. The main difference is that they are context locals instead of thread locals, and a context is any current execution stack.

A thread has it's own context (hence the concept of context switch when reading about threading theory). A process has it's context, which contains threads (and a main thread).

So far, in the theory we know, A process contains a thread, and a thread contains it's own execution context. The data is always shared unless a thread can create it's own data context. This is where the thread local (variables/data) concept appears.

But to address this concept of concurrent execution, and considering the C10K problem, the asynchronous execution in one thread was preferred instead of multiple blocking threads with the corresponding Context Switch (specially regarding python, where we have the GIL in the default python distr0). The Greenlet was created as a same-thread switching context, and now the hierarchy changed:

Process 1--* thread 1--* greenlet (and now the requests are here)

So the concept of Greenlets was created and implemented in Python in servers like Gevent, and you cannot use anymore thread local data, because requests are not bound anymore to threads (i.e. they could be sharing the same threadlocal context, racing over data).

Now the context itself is the greenlet, and we need a notion of context local instead of thread locals.

So: How does Flask use a context local which isolates data for each request? (e.g. a session, a request). The answer to context-agnostic isolation is here:

Werkzeug's Context Locals

Werkzeug and Flask have the same creator. Werkzeug is not a Framework but just a set of utilities you can use in any WSGI framework (e.g. Django). The framework itself is Flask, which actually depends on Werkzeug's utilities.

The Werkzeug's context locals help to create (properly said) context-locals (context meaning either of thread, request, or process - depending on how the server dispatches the requests) which could help us to store greenlet-specific data and avoid the use of threadlocals:

#a python module for my django project where I define
#a custom field class which statically needs to know the
#current request.

#I was using, instead, a threadlocal. The usage is THE SAME.
#the main difference is that threads are GCed, while contexts
#not necessarily, so you must ALWAYS release them explicitly
#using release_local, for the current context.

#this code below used to have `threading.local` instances
#instead of `werkzeug.local.Local` instances.

#as I said before, assigning data works like before, but
#the main difference is when releasing the data.

from werkzeug.local import Local, release_local

class AutocompleteField(object):

    DATA = Local()

    @staticmethod
    def set_request(request):
        AutocompleteField.DATA.request = request

    @staticmethod
    def unset_request(request):
        release_local(AutocompleteField.DATA)

    @staticmethod
    def get_request():
        try:
            return AutocompleteField.DATA.request
        except AttributeError as e:
            return None

I am not sure how you helped him because he is talking about django. — Burhan Khalid, Jul 18 '14 at 15:12
Actually, the post is mine. Althought this is django, you can freely use Werkzeug's utilities (they are not just for flask, but for any wsgi app). However, found on the monkey.patch_all() method the true answer. That's why the post edition. — Luis Masuelli, Jul 18 '14 at 15:33

GEvent / GUnicorn and the C10k issue

1 Answers1