3

This is a server running on Linux (CentOS I think) using Python 2.6.6 and Django 1.2.3.

What happens is the python process running django all of a sudden starts using 100% cpu constantly until it has been restarted. This has only happened twice recently and only started doing this less than a month ago. I haven't made any big changes to the code in about 7 months.

Looking at the output from the console its not under any extreme usage. Only about 10 queries in about 10 minutes leading up to the time I believe it started using 100% cpu. The only error printed is a broken pipe error which I think may be someone using it once it has slowed down and closing the connection.

I re-run all of the queries that were around the time of it slowing down and they all worked perfectly fine without any problems.

The Server itself is still functional in a sense but is extremely slow. I have a series of tests which i run every day they usually take around 7 minutes but when its this slow it can range from 2-3 hours.

If anyone has any ideas I would be very grateful.

Also as you can probably notice I am quite a newbie when it comes to these problems if someone could recommend a good practice on how to monitor these kind of activities.

Thanks for your time!

Below is the output I mentioned, The time it starts 100% cpu is ~4pm

[23/Jul/2012 15:49:55] "GET /CFXsearch/?n=&v=all&e=2012Week30&c=all&r=all&p=all&run=all&pl=all&m=all&o=all&d1=&d2=&submitOption=comparing&compareBy=plat&sort=sortName&Soft=CFX HTTP/1.1" 200 67228
[23/Jul/2012 15:50:00] "GET /CFXsearch/?n=&v=all&e=2012Week30&c=all&r=all&p=all&run=all&pl=RH5&m=all&o=all&d1=&d2=&submitOption=comparing&compareBy=plat&sort=sortName&Soft=CFX HTTP/1.1" 200 33346
[23/Jul/2012 15:50:05] "GET /CFXsearch/?n=&v=all&e=2012Week30&c=all&r=all&p=all&run=all&pl=SLES10&m=all&o=all&d1=&d2=&submitOption=comparing&compareBy=plat&sort=sortName&Soft=CFX HTTP/1.1" 200 33394
[23/Jul/2012 15:54:48] "GET /CFXsearch/?n=&v=all&e=2012Week30&c=all&r=all&p=all&run=all&pl=SLES11&m=all&o=all&d1=&d2=&submitOption=comparing&compareBy=plat&sort=sortName&Soft=CFX HTTP/1.1" 200 33394
[23/Jul/2012 15:54:53] "GET /results/?n=&e=2012Week30&c=TGTest&pl=SLES11&p=single&p=defined&p=double&run=default&run=hpmpi&run=mpich&run=mpich2&run=Platform&run=pvm%20parallel&run=serial&sort=sortResult&d=&y=&submitOption=latestsearch&Soft=CFX HTTP/1.1" 200 19350
[23/Jul/2012 15:54:57] "GET /results/?n=&e=2012Week30&c=turboexamples&pl=SLES11&p=single&p=defined&p=double&run=default&run=hpmpi&run=mpich&run=mpich2&run=Platform&run=pvm%20parallel&run=serial&sort=sortResult&d=&y=&submitOption=latestsearch&Soft=CFX HTTP/1.1" 200 36729
[23/Jul/2012 15:59:40] "GET / HTTP/1.1" 200 11111
[23/Jul/2012 15:59:40] "GET /site_media/style.css HTTP/1.1" 304 0
[23/Jul/2012 15:59:45] "GET /CFXsearch/ HTTP/1.1" 200 25637
[23/Jul/2012 15:59:45] "GET /site_media/jquery-1.2.6.min.js HTTP/1.1" 304 0
[23/Jul/2012 15:59:45] "GET /site_media/sorttable.js HTTP/1.1" 304 0
[23/Jul/2012 16:00:04] "GET /CFXsearch/?n=&v=14.5&e=all&c=solver54&r=all&p=all&run=all&pl=all&m=all&o=all&d1=&d2=&submitOption=comparing&compareBy=plat&sort=sortName&Soft=CFX HTTP/1.1" 200 402737
[23/Jul/2012 16:00:19] "GET /results/?n=&e=2012Week29&c=solver54&pl=SLES11&p=single&p=defined&p=double&run=default&run=hpmpi&run=mpich&run=mpich2&run=Platform&run=pvm%20parallel&run=serial&sort=sortResult&d=&y=&submitOption=latestsearch&Soft=CFX HTTP/1.1" 200 1557488
[23/Jul/2012 16:02:48] "GET /CFXsearch/?n=&v=14.5&e=all&c=solver54&r=all&p=all&run=all&pl=all&m=all&o=all&d1=&d2=&submitOption=comparing&compareBy=ex&sort=sortName&Soft=CFX HTTP/1.1" 200 408388
[23/Jul/2012 16:03:01] "GET /CFXsearch/?n=&v=14.5&e=all&c=solver54&r=all&p=all&run=all&pl=all&m=all&o=all&d1=&d2=&submitOption=comparing&compareBy=plat&sort=sortName&Soft=CFX HTTP/1.1" 200 402737
Traceback (most recent call last):
  File "/home/install2/testingDatabase/lib/python2.6/site-packages/django/core/servers/basehttp.py", line 281, in run
    self.finish_response()
  File "/home/install2/testingDatabase/lib/python2.6/site-packages/django/core/servers/basehttp.py", line 321, in finish_response
    self.write(data)
  File "/home/install2/testingDatabase/lib/python2.6/site-packages/django/core/servers/basehttp.py", line 400, in write
    self.send_headers()
  File "/home/install2/testingDatabase/lib/python2.6/site-packages/django/core/servers/basehttp.py", line 465, in send_headers
    self._write(str(self.headers))
  File "/home/install2/testingDatabase/Python-2.6.6/Lib/socket.py", line 318, in write
    self.flush()
  File "/home/install2/testingDatabase/Python-2.6.6/Lib/socket.py", line 297, in flush
    self._sock.sendall(buffer(data, write_offset, buffer_size))
error: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/install2/testingDatabase/Python-2.6.6/Lib/SocketServer.py", line 560, in process_request_thread
    self.finish_request(request, client_address)
  File "/home/install2/testingDatabase/Python-2.6.6/Lib/SocketServer.py", line 322, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/home/install2/testingDatabase/lib/python2.6/site-packages/django/core/servers/basehttp.py", line 562, in __init__
    BaseHTTPRequestHandler.__init__(self, *args, **kwargs)
  File "/home/install2/testingDatabase/Python-2.6.6/Lib/SocketServer.py", line 618, in __init__
    self.finish()
  File "/home/install2/testingDatabase/Python-2.6.6/Lib/SocketServer.py", line 661, in finish
    self.wfile.flush()
  File "/home/install2/testingDatabase/Python-2.6.6/Lib/socket.py", line 297, in flush
    self._sock.sendall(buffer(data, write_offset, buffer_size))
error: [Errno 32] Broken pipe
[23/Jul/2012 16:09:59] "GET /PolyflowSummary/ HTTP/1.1" 200 7561
[23/Jul/2012 16:17:42] "GET /?soft=CFX HTTP/1.1" 200 11112
[23/Jul/2012 16:17:44] "GET /?soft=CFX HTTP/1.1" 200 11112
[23/Jul/2012 16:18:06] "GET /site_media/style.css HTTP/1.1" 200 432
[23/Jul/2012 16:18:23] "GET /site_media/style.css HTTP/1.1" 200 432
[23/Jul/2012 16:18:23] "GET /site_media/favicon.ico HTTP/1.1" 200 1718
Paul Ayling
  • 53
  • 1
  • 4

1 Answers1

3

you can connect to the process and attach a debugger. i have done this before and it's very useful. my full notes are here, but the abridged version is:

  • install this so that gdb "understands" python

  • connect using gdb -p PID (PID from ps or similar)

  • generate a stack trace in gdb and you'll see exactly where you are eating up CPU.

original credit - Showing the stack trace from a running Python application (in fact, after typing all that, maybe this is a dupe of the linked question? i guess the question is different, even if the answer is the same...)

Community
  • 1
  • 1
andrew cooke
  • 45,717
  • 10
  • 93
  • 143