Issue I Was Having
I was having an issue where some sites were taking a long time to load (By "long time" I mean up to 16 seconds). Sometimes they might timeout entirely, which generated a Nginx 504 error. Usually, when a site timed out I could reload the site again and it would load quickly. The site that I was having issues with gets a very low amount of traffic. I'm testing the site by loading the Django admin index page in order to try and eliminate any slowness that could be caused because of poor code. It should also be noted that this particular site only uses the Django admin because it's an intranet-type site for staff only.
Hosting Setup
All the sites I'm hosting are on two Rackspace cloud servers. The first server is my app server, which has 1024 MB of RAM, and my second server is my database server, which has 2048 MB of RAM. The app server is serving up each site using Nginx, which serves all static files and proxies everything else to the Django Gunicorn workers for each site.
When looking at the database servers RAM and CPU load it seems like everything is fine on the database server.
$ free -m
total used free shared buffers cached
Mem: 1999 1597 402 0 200 1007
-/+ buffers/cache: 389 1610
Swap: 4094 0 4094
Top shows a CPU load average of: 0.00, 0.01, 0.05
In order to try and troubleshoot what is happening, I wrote a quick little script which prints out the memory usage on the app server.
Example print out with the site domains anonymized:
Celery: 23 MB
Gunicorn: 566 MB
Nginx: 8 MB
Redis: 684 KB
Other: 73 MB
total used free shared buffers cached
Mem: 993 906 87 0 19 62
-/+ buffers/cache: 824 169
Swap: 2047 828 1218
Gunicorn memory usage by webste:
site01.example.com 31 MB
site02.example.com 19 MB
site03.example.com 7 MB
site04.example.com 9 MB
site05.example.com 47 MB
site06.example.com 25 MB
site07.example.com 14 MB
site08.example.com 18 MB
site09.example.com 27 MB
site10.example.com 15 MB
site11.example.com 14 MB
site12.example.com 7 MB
site13.example.com 18 MB
site14.example.com 18 MB
site15.example.com 10 MB
site16.example.com 25 MB
site17.example.com 13 MB
site18.example.com 18 MB
site19.example.com 37 MB
site20.example.com 30 MB
site21.example.com 23 MB
site22.example.com 28 MB
site23.example.com 80 MB
site24.example.com 15 MB
site25.example.com 5 MB
Example Gunicorn config file:
pidfile = '/var/run/gunicorn_example.com.pid'
proc_name = 'example.com'
workers = 1
bind = 'unix:/tmp/gunicorn_example.com.sock'
Example Nginx config:
upstream example_app_server {
server unix:/tmp/gunicorn_example.com.sock fail_timeout=0;
}
server {
listen 80;
server_name example.com;
access_log /var/log/nginx/example.com.access.log;
error_log /var/log/nginx/example.com.error.log;
location = /favicon.ico {
return 404;
}
location /static/ {
root /srv/sites/example/;
}
location /media/ {
root /srv/sites/example/;
}
location / {
proxy_pass http://example_app_server;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
client_max_body_size 10m;
}
}
As you can see, there is a lot of memory that is swapped, so in order to fix my issues I upgraded the ram on the app server, which fixed the sites' slowness entirely. Even though I was able to fix the issue, it took me a lot longer than I would like and I still feel like I was basically guessing at what was causing the site slowness. All this leads me to my questions...
Questions
- How can you tell site slowness on a low traffic site isn't caused by site inactivity which causes the site to become inactive, which then causes Gunicorn to have to load the site again after the site has gone inactive? Is there a setting to prevent a site from going inactive?
- It seems like I have some sites that are taking too much memory. What are some tools I could use to reduce how much memory a site is using? Should I use some Python profiling tools?
- What are some tools and steps to take in order to determine at what level in the stack the bottleneck is occurring?
- What is the best way to determine if it's your Gunicorn processes that are getting swapped or if it's other processes that are getting swapped?
- Most of the sites I'm hosting don't get a ton of traffic so I'm using just one Gunicorn worker. Is there a more scientific way for determining and adjusting how many Gunicorn workers you have on a site?
- When hosting multiple sites on the same server, are there ways to configure things to use less memory?