Enforcing concurrent thread limit per IP in a WSGI/apache app

Question

We're running a Flask app exposing data stored in a database. It returns a lot of 503 errors. My understanding is that those are generated by apache when the maximum number of concurrent threads is reached.

The root cause is most probably the app performing poorly but at this stage, we can't afford much more development time, so I'm looking for a cheap deployment config hack to mitigate the issue.

Data providers are sending data at a high rate. I believe their program gets a lot of 503 and just try/catch those to retry until success.
Data consumers use the app at a much lower rate and I'd like them not to be bothered so much by those issues.

I'm thinking of limiting the number of concurrent accesses from the IP of each provider. They may get a lower throughput but they'd live with it as they already do, and it would make life easier for casual consumers.

I identified the mod_limitipconn which seems to be taylored for this.

mod_limitipconn [...] allows administrators to limit the number of simultaneous requests permitted from a single IP address.

I'd like to be sure I understand how it works and how the limits are set.

I always figured there were a maximum of 5 simultaneous connection due to the WSGI settings: threads=5. But I read Processes and Threading in mod_wsgi docs and I'm confused.

Considering the configuration below, are those assumptions correct?

Only one instance of the application is running at a time.
A maximum of 5 concurrent threads can be spawned.
When 5 requests are being treated, if a sixth request arrives, the client gets a 503.
Limiting the number of simultaneous requests for IP x.x.x.x. at apache level to 3 would ensure than only 3 of those 5 threads can be used by that IP, leaving 2 to the other IPs.
Raising the number of threads in WSGI config could help share the connection pool amongst clients by providing more granularity in the rate limits (you can limit to 3 for each of 4 providers and keep 5 more with a total of 17) but would not improve the overall performance, even if the server has idle cores, because the Python GIL prevents several threads to run at the same time.
Raising the number of threads to a high number like 100 may make the requests longer but would limit 503 responses. It might even be enough if the clients set their own concurrent requests limit not too high and if they don't, I can enforce that with something like mod_limitipconn.
Raising the number of threads too much would make the requests so long that the clients would get timeouts instead of 503 which is not really better.

Current config below. Not sure what matters.

apachectl -V:

Server version: Apache/2.4.25 (Debian)
Server built:   2018-06-02T08:01:13
Server's Module Magic Number: 20120211:68
Server loaded:  APR 1.5.2, APR-UTIL 1.5.4
Compiled using: APR 1.5.2, APR-UTIL 1.5.4
Architecture:   64-bit
Server MPM:     event
  threaded:     yes (fixed thread count)
    forked:     yes (variable process count)

/etc/apache2/apache2.conf:

# KeepAlive: Whether or not to allow persistent connections (more than
# one request per connection). Set to "Off" to deactivate.
#
KeepAlive On

#
# MaxKeepAliveRequests: The maximum number of requests to allow
# during a persistent connection. Set to 0 to allow an unlimited amount.
# We recommend you leave this number high, for maximum performance.
#
MaxKeepAliveRequests 100

/etc/apache2/mods-available/mpm_worker.conf (but that shouldn't matter in event more, right?):

<IfModule mpm_worker_module>
        StartServers                     2
        MinSpareThreads          25
        MaxSpareThreads          75
        ThreadLimit                      64
        ThreadsPerChild          25
        MaxRequestWorkers         150
        MaxConnectionsPerChild   0
</IfModule>

/etc/apache2/sites-available/my_app.conf:

WSGIDaemonProcess my_app threads=5

What WSGI are you using? Perhaps I overlooked in your detailed question. Really good detail by the way. Have you got this hooked up to gunicorn or something similar? — Swift, Sep 12 '18 at 17:10
Answered my own question. You are using the built in apache one. I have only ever used Flask in conjunction with gunicorn and Nginx as a reverse web proxy. I'm pretty sure your problem lies therein. — Swift, Sep 12 '18 at 17:14

score 1 · Answer 1 · answered Sep 13 '18 at 10:33

I'd like them not to be bothered so separate data providers' requests from data consumers (I'm not familiar with apache so I'm not showing you a production-ready config but an overall approach):

<VirtualHost *>
    ServerName example.com

    WSGIDaemonProcess consumers user=user1 group=group1 threads=5
    WSGIDaemonProcess providers user=user1 group=group1 threads=5
    WSGIScriptAliasMatch ^/consumers_ulrs/.* /path_to_your_app/consumers.wsgi process-group=consumers
    WSGIScriptAliasMatch ^/providers_ulrs/.* /path_to_your_app/providers.wsgi process-group=providers

    ...

</VirtualHost>

By limiting request amount per each IP you can harm user experience and still don't solve your problem. For example, take note that many independent users may have same IPs because of how NAT and ISP work.

P.S. It's quite strange that ThreadsPerChild=25 but WSGIDaemonProcess my_app threads=5. Are you sure that with that config all created threads by Apache would be utilized by WSGI server?

Thanks. I get the idea. I can't really separate the routes of consumers and providers. I made it simple in the example, but consumers may also produce a bit of data and send it with the same routes. My idea was to limit the rate for a given set of IPs, which mod_limitipconn might not be able to do, but limiting simultaneous connections to all IPs should not be a great issue. Good point about the NAT, though. — Jérôme, Sep 13 '18 at 12:35
`ThreadsPerChild=25` is a default I never bothered about until posting this question and I'm not sure it is even used since it appears in `mpm_worker.conf` and we're using `event` mode. My understanding of WSGI/apache is that apache may have 25 threads for Python and other stuff (static files) but with the `WSGIDaemonProcess my_app threads=5`, the Python app only gets 5. But it is all blurry to me, hence this question. — Jérôme, Sep 13 '18 at 12:39

score 1 · Answer 2 · answered Oct 19 '18 at 07:20

I ended up following a different approach. I added a limiter in the application code to take care of this.

"""Concurrency requests limiter

Inspired by Flask-Limiter
"""

from collections import defaultdict
from threading import BoundedSemaphore
from functools import wraps

from flask import request
from werkzeug.exceptions import TooManyRequests


# From flask-limiter
def get_remote_address():
    """Get IP address for the current request (or 127.0.0.1 if none found)

    This won't work behind a proxy. See flask-limiter docs.
    """
    return request.remote_addr or '127.0.0.1'


class NonBlockingBoundedSemaphore(BoundedSemaphore):
    def __enter__(self):
        ret = self.acquire(blocking=False)
        if ret is False:
            raise TooManyRequests(
                'Only {} concurrent request(s) allowed'
                .format(self._initial_value))
        return ret


class ConcurrencyLimiter:

    def __init__(self, app=None, key_func=get_remote_address):
        self.app = app
        self.key_func = key_func
        if app is not None:
            self.init_app(app)

    def init_app(self, app):
        self.app = app
        app.extensions = getattr(app, 'extensions', {})
        app.extensions['concurrency_limiter'] = {
            'semaphores': defaultdict(dict),
        }

    def limit(self, max_concurrent_requests=1):
        def decorator(func):
            @wraps(func)
            def wrapper(*args, **kwargs):
                # Limiter not initialized
                if self.app is None:
                    return func(*args, **kwargs)
                identity = self.key_func()
                sema = self.app.extensions['concurrency_limiter'][
                    'semaphores'][func].setdefault(
                        identity,
                        NonBlockingBoundedSemaphore(max_concurrent_requests)
                    )
                with sema:
                    return func(*args, **kwargs)
            return wrapper
        return decorator


limiter = ConcurrencyLimiter()


def init_app(app):
    """Initialize limiter"""

    limiter.init_app(app)
    if app.config['AUTHENTICATION_ENABLED']:
        from h2g_platform_core.api.extensions.auth import get_identity
        limiter.key_func = get_identity

Then all I need to do is apply that decorator to my views:

@limiter.limit(1)  # One concurrent request by user
def get(...):
    ...

In practice, I only protected the ones that generate high traffic.

Doing this in application code is nice because I can set a limit per authenticated user and not per IP.

To do so, all I need to do is replace the default get_remote_address in key_func with a function that returns the user's unique identified.

Note that this sets a different limit for each view function. If the limit needs to be global, it can be implemented differently. In fact, it would be even simpler.

Enforcing concurrent thread limit per IP in a WSGI/apache app

2 Answers2