11

I know it's not recommended to run a Bottle or Flask app on production with python myapp.py --port=80 because it's a development server only.

I think it's not recommended as well to run it with python myapp.py --port=5000 and link it to Apache with: RewriteEngine On, RewriteRule /(.*) http://localhost:5000/$1 [P,L] (or am I wrong?), because WSGI is preferred.

So I'm currently setting up Python app <-> mod_wsgi <-> Apache (without gunicorn or other tool to keep things simple).

Question: when using WSGI, I know it's Apache and mod_wsgi that will automatically start/stop enough processes running myapp.py when requests will come, but:

  1. how can I manually stop these processes?
  2. more generally, is there a way to monitor them / know how many processes started by mod_wsgi are currently still running? (one reason, among others, is to check if the processes terminate after a request or if they stay running)

Example:

  • I made some changes in myapp.py, and I want to restart all processes running it, that have been launched by mod_wsgi (Note: I know that mod_wsgi can watch changes on the source code, and relaunch, but this only works on changes made on the .wsgi file, not on the .py file. I already read that touch myapp.wsgi can be a solution for that, but more generally I'd like to be able to stop and restart manually)

  • I want to temporarily stop the whole application myapp.py (all instances of it)

I don't want to use service apache2 stop for that because I also run other websites with Apache, not just this one (I have a few VirtualHosts). For the same reason (I run other websites with Apache, and some client might be downloading a 1 GB file at the same time), I don't want to do service apache2 restart that would have an effect on all websites using Apache.

I'm looking for a cleaner way than kill pid or SIGTERM, etc. (because I read it's not recommended to use signals in this case).

Note: I already read How to do graceful application shutdown from mod_wsgi, it helped, but here it's complementary questions, not a duplicate.


My current Python Bottle + Apache + mod_wsgi setup:

  • Installation:

    apt-get install libapache2-mod-wsgi
    a2enmod wsgi      # might be done automatically by previous line, but just to be sure
    
  • Apache config (source: Bottle doc; a more simple config can be found here):

    <VirtualHost *:80>
      ServerName example.com
      WSGIDaemonProcess yourapp user=www-data group=www-data processes=5 threads=5
      WSGIScriptAlias / /home/www/wsgi_test/app.wsgi
      <Directory />
        Require all granted
      </Directory>
    </VirtualHost>
    

    There should be up to 5 processes, is that right? As stated before in the question, how to know how many are running, how to stop them?

  • /home/www/wsgi_test/app.wsgi (source: Bottle doc)

    import os
    from bottle import route, template, default_app
    
    os.chdir(os.path.dirname(__file__))
    
    @route('/hello/<name>')
    def index(name):
        return template('<b>Hello {{name}}</b>!', name=name)
    
    application = default_app()
    
Basj
  • 41,386
  • 99
  • 383
  • 673
  • Are you using mod_wsgi daemon mode. You should be as that is the recommended mode. You should avoid using embedded mode. Once using daemon mode, what is the mod_wsgi configuration you are using in the Apache configuration file? – Graham Dumpleton Apr 22 '18 at 11:27
  • So you have now added a bounty, but haven't answered the question as to whether you are using daemon mode and with what configuration. I can answer your questions, but because there are two different modes with mod_wsgi, it is necessary to now which you are using and with what configuration. – Graham Dumpleton Apr 29 '18 at 10:09
  • Simple answer is don't use embedded mode. If you can confirm you are using daemon mode and your current configuration, then I can explain your options. – Graham Dumpleton Apr 29 '18 at 10:13
  • BTW, your interpretation of that document about signal handlers is wrong. You can use signals, but you would rely on the inbuilt capabilities of Apache and mod_wsgi to handle them. You don't need to define your own signal handlers. – Graham Dumpleton Apr 30 '18 at 00:24
  • @GrahamDumpleton Could you just confirm: the current configuration, as shown in the question (inside `VirtualHost`), is `Embedded mode` or is it `Daemon mode`? – Basj Nov 28 '19 at 10:35

3 Answers3

2

Taken partially from this question, add display-name to WSGIDaemonProcess so you can grab them using a command like:

ps aux | grep modwsgi

Add this to your configuration:

Define GROUPNAME modwsgi
WSGIDaemonProcess yourapp user=www-data group=www-data processes=5 threads=5 display-name=%{GROUPNAME}

Update

There are a couple of reasons why ps would not give you the DaemonProcess display-name.
As shown in the docs:

display-name=value Defines a different name to show for the daemon process when using the ps command to list processes. If the value is %{GROUP} then the name will be (wsgi:group) where group is replaced with the name of the daemon process group.

Note that only as many characters of the supplied value can be displayed as were originally taken up by argv0 of the executing process. Anything in excess of this will be truncated.

This feature may not work as described on all platforms. Typically it also requires a ps program with BSD heritage. Thus on some versions of Solaris UNIX the /usr/bin/ps program doesn’t work, but /usr/ucb/ps does. Other programs which can display this value include htop.

You could:

Set a display-name of smaller length:

WSGIDaemonProcess yourapp user=www-data group=www-data processes=5 threads=5 display-name=wsws

And try to find them by:

ps aux | grep wsws

Or set it to %{GROUP} and filter using the name of the daemon process group (wsgi:group).

Evhz
  • 8,852
  • 9
  • 51
  • 69
  • mmm, could be they are not labelled. As soon as they are daemon processes, they are supposed to be in background waiting for requests. Did you try to see how many apache processes are running? Try with the update in case it is a label problem. – Evhz Apr 29 '18 at 12:06
  • @Basj Added docs link. If you set is as `%{GROUP}` it will count characters of the wsgi:group, that is this time shoud be the value of `WSGIProcessGroup` – Evhz Apr 29 '18 at 13:16
  • 1
    Thanks @Evhz. I posted a [test](https://stackoverflow.com/a/50086901/1422096) that confirms that the processes stay running and don't restart from 0 each time, that's a good thing :) – Basj Apr 29 '18 at 13:21
  • How do you stop the processes @Evhz? Let's imagine you have modified the .py / .wsgi server, do you do `ps aux |grep wsws` and then `kill 1827` (pid)? Isn't there a cleaner way to stop the Python server? – Basj Nov 28 '19 at 10:38
2

The way which processes are managed with mod_wsgi for each mode is described in:

For embedded mode, where your WSGI application is run inside of the Apache child worker processes, Apache manages when processes are created and destroyed based on the Apache MPM settings. Because of how Apache manages the processes, they can be shutdown at any time if there is insufficient request throughput, or more processes could be created if request throughput increases. When running, the same process will handle many requests over time until it gets shutdown. In other words, Apache dynamically manages the number of processes.

Because of this dynamic process management, it is a bad idea to use embedded mode of mod_wsgi unless you know how to tune Apache properly and many other things as well. In short, never use embedded mode unless you have a good amount of experience with Apache and running Python applications with it. You can watch a video about why you wouldn't want to run in embedded mode at:

There is also the blog post:

So use daemon mode and verify that your configuration is correct and you are in fact using daemon mode by using the check in:

For daemon mode, the WSGI application runs in a separate set of managed processed. These are created at the start and will run until Apache is restarted, or reloading of the process is triggered for various reasons, including:

  • The daemon process is sent a direct signal to shutdown by a user.
  • The code of the application sends itself a signal.
  • The WSGI script file is modified, which will trigger a shutdown so the WSGI application can be reloaded.
  • A defined request timeout occurs due to stuck or long running request.
  • A defined maximum number of requests has occurred.
  • A defined inactivity timeout expires.
  • A defined timer for periodic process restart expires.
  • A startup timeout is defined and the WSGI application failed to load in that time.

In these cases, when the process is shutdown, it is replaced.

More details about the various timeout options and how the processes respond to signals can be found in:

More details about source code reloading and touching of the WSGI script file can be found in:

One item which is documented is how you can incorporate code which will look for any changes to Python code files used by your application. When a change occurs to any of the files, the process will be restarted by sending itself a signal. This should only be used for development and never in production.

If you are using mod_wsgi-express in development, which is preferable to hand configuring Apache yourself, you can use the --reload-on-changes option.

If sending a SIGTERM signal to the daemon process, there is a set shutdown sequence where it will wait a few seconds to wait for current requests to finish. If the requests don't finish, the process will be shutdown anyway. That period of time is dictated by the shutdown timeout. You shouldn't play with that value.

If sending a SIGUSR1 signal to the daemon process, by default it acts just like sending a SIGTERM signal. If however you specify the graceful timeout for shutdown, you can extend how long it will wait for current requests to finish. New requests will be accepting during that period. That graceful timeout also applies in other cases as well, such as maxmimum number of requests received, or timer for periodic restart triggered. If you need the timeout when using SIGUSR1 to be different to those cases, define the eviction timeout instead.

As to how to identify the daemon processes to be sent the signal, use the display-name of option WSGIDaemonProcess. Then use ps to identify the processes, or possibly use killall if it uses the modified process name on your platform. Send the daemon processes the SIGUSR1 signal if want more graceful shutdown and SIGTERM if want them to restart straight away.

If you want to track how long a daemon process has been running, you can use:

import mod_wsgi
metrics = mod_wsgi.process_metrics()

The metrics value will include output like the following for the process the call is made in:

{'active_requests': 1,
 'cpu_system_time': 0.009999999776482582,
 'cpu_user_time': 0.05000000074505806,
 'current_time': 1525047105.710778,
 'memory_max_rss': 11767808,
 'memory_rss': 11767808,
 'pid': 4774,
 'request_busy_time': 0.001851,
 'request_count': 2,
 'request_threads': 2,
 'restart_time': 1525047096.31548,
 'running_time': 9,
 'threads': [{'request_count': 2, 'thread_id': 1},
             {'request_count': 1, 'thread_id': 2}]}

If you just want to know how many processes/threads are used for the current daemon process group you can use:

mod_wsgi.process_group
mod_wsgi.application_group
mod_wsgi.maximum_processes
mod_wsgi.threads_per_process

to get details about the process group. The number of process is fixed at this time for daemon mode and the name maximum_processes is just to be consistent with what the name is in embedded mode.

If you need to run code on process shutdown, you should NOT try and define your own signal handlers. Do that and mod_wsgi will actually ignore them as they will interfere with normal operation of Apache and mod_wsgi. Instead, if you need to run code on process shutdown, use atexit.register(). Alternatively, you can subscribe to special events generated by mod_wsgi and trigger something off the process shutdown event.

Graham Dumpleton
  • 57,726
  • 6
  • 119
  • 134
  • You have ``WSGIProcessGroup``, but that doesn't mean it definitely is as that needs to be placed in right context so applied. That said, the path to the ``Directory`` directive does look correct so will be applied. If want to be absolutely sure is being applied, remove ``WSGIProcessGroup`` and use ``process-group`` option on ``WSGIScriptAlias`` directive instead. You can do same for ``WSGIApplicationGroup`` as well. Also add ``WSGIRestrictEmbedded On`` at top level scope to disable embedded mode completely so can't make a mistake. – Graham Dumpleton May 03 '18 at 21:26
  • As I said, it looks okay as path to directory in ``Directory`` matches ``WSGIScriptAlias``, but you are better off using ``process-group`` and ``application-group`` options on ``WSGIScriptAlias``. You don't need ``user`` and ``group`` options on ``WSGIDaemonProcess`` as it defaults to Apache user already. – Graham Dumpleton May 04 '18 at 09:15
  • You can check it is correct using checks in http://modwsgi.readthedocs.io/en/develop/user-guides/checking-your-installation.html#embedded-or-daemon-mode and http://modwsgi.readthedocs.io/en/develop/user-guides/checking-your-installation.html#sub-interpreter-being-used – Graham Dumpleton May 04 '18 at 09:16
  • The fact that processes get restarted seem to be an issue for you. If you continue to use embedded mode you have no control over it and Apache can do it at any time. I already linked the video explaining why embedded mode is bad. Daemon mode also gives better control over recovering when your application gets stuck, embedded mode doesn't and a misbehaving application can take your site down much more easily with no way to recover. – Graham Dumpleton May 04 '18 at 10:03
  • No drama, I don't need it. – Graham Dumpleton May 07 '18 at 16:40
  • For future reference, could you just edit your answer to add precisions @Graham Dumpleton: is the original configuration of my question (`WSGIDaemonProcess yourapp user=www-data group=www-data processes=5 threads=5` `WSGIScriptAlias / /home/www/wsgi_test/app.wsgi`) in Daemon or Embedded mode? – Basj Nov 28 '19 at 10:41
1

Edit: a more simple WSGI config is given in my question of Python WSGI handler directly in Apache .htaccess, not in VirtualHost


Based on Evhz's answer, I made a simple test to check that the processes are still running:

Apache config:

<VirtualHost *:80>
  ServerName example.com
  <Directory />
    AllowOverride All
    Require all granted
  </Directory>
  WSGIScriptAlias / /home/www/wsgi_test/app.wsgi
  WSGIDaemonProcess yourapp user=www-data group=www-data processes=5 threads=5 display-name=testwsgi
</VirtualHost>

app.wsgi file:

import os, time
from bottle import route, template, default_app

os.chdir(os.path.dirname(__file__))

@route('/hello/<name>')
def index(name):
    global i
    i += 1
    return template('<b>Hello {{name}}</b>! request={{i}}, pid={{pid}}',
        name=name, i=i, pid=os.getpid())

i = 0
time.sleep(3)     # wait 3 seconds to make the client notice we launch a new process!

application = default_app()

Now access http://www.example.com/hello/you many times:

    

The initial time.sleep(3) will help, from the client browser, to see exactly when a new process is started, and the request counter i will allow to see how many requests have been served by each process.

The PIDs will correspond to those present in ps aux | grep testwsgi:

enter image description here

Also the time.sleep(3) will happen maximum 5 times (at the startup of each of the 5 processes), then the processes should run forever, until we restart/stop the server or modify the app.wsgi file (modifying it triggers a restart of the 5 processes, you can see new PIDs).


[I'll check that by letting my test run now, and access http://www.example.com/hello/you in 2 days to see if it's still a previously-launched process or a new one!]

Edit: the next day, the same processes were still up and running. Now, two days after, when reloading the same URL, I noticed new processes were created... (Is there a time after which a process with no request dies?)

Basj
  • 41,386
  • 99
  • 383
  • 673
  • Your latest case of the processes being restarted was likely not because of Apache/mod_wsgi itself, but because your operating system is configured with log file rotation for Apache and something is sending a signal to Apache once a day to force it to restart. If this is a problem, you should exclude Apache from the separate log file rotation system and use Apache's own log file rotation mechanism. – Graham Dumpleton May 01 '18 at 21:08
  • @GrahamDumpleton oh right! My Apache log rotation is configured once per month. And since we're May 1st, it's maybe this! Just being curious: why does log rotation make the wsgi processes restart? Both seem to be disconnected to each other... I'm curious about the reason! – Basj May 01 '18 at 21:33
  • Log file rotation sends a SIGHUP or SIGUSR1 to Apache which causes it to re-read its configuration, shutdown existing processes and start up new ones too replace them. As the mod_wsgi daemon processes are still managed under Apache, they are affected. – Graham Dumpleton May 01 '18 at 22:10