3

I have two Rails app (production and staging environment) in a remote server.

I am currently experiencing a strange problem where Puma would sometimes give me timeout after I finished deployment (via cap deploy). This has been happening for quite some time now and it's getting more frequent. Whenever this happens, I need to restart Puma server (either from cap puma:stop and cap puma:start), or manually do kill -9 <pid of puma instance>. However, in both cases I need to firstly rm puma.sock from shared/tmp/sockets directory.

On the other hand, my production environment did not experience this issue. The difference between them is just # of commits, my staging environment is several (~50) commits ahead. Earlier when I merged staging to production and deployed, the same problem appears in production. So I rolled back my production to previous revision, restarted Puma, and the problem went away.

Note: cap puma:restart somehow does not solve this; I have to kill current Puma instance, and start a new one in order to make this problem go away.

My current setup is:

  • Rails 4.1
  • Puma
  • Nginx
  • Capistrano 3

On the time the error occurred, nothing logged into Rails log, but Nginx logs some error:

  • upstream timed out (110: Connection timed out) while reading response header from upstream after waiting for 60 seconds, page for 500 is shown.
  • recv() failed (104: Connection reset by peer) while reading response header from upstream page for 500 shown instantly.
  • connect() to unix:/var/deploy/medictrust-staging/shared/tmp/sockets/puma.sock failed (111: Connection refused) while connecting to upstream page for 500 shown instantly.

The errors above happen randomly; sometimes it's connection timed out, sometimes it's connection refused.. But the most frequent one is the connection timed out.

Strange thing is, Puma is not timing out if I access my application via cURL. There was no changes made within Puma or Nginx config, so is it possible that this is caused by application code?

How do I make this problem go away for good?

dvdchr
  • 734
  • 11
  • 26
  • 1
    Did you ever figure this one out? I'm dealing with something similar, but only in my production environment. https://xkcd.com/979/ – steel Feb 17 '16 at 21:08
  • 1
    Hey @steel, if I remember correctly, the web server was timing out because there were long-running (read: stuck) queries all over the database. The available database pool was exhausted, and Puma kept waiting for the stuck queries to finish. To check whether my problem is similar to yours, simply login to your production database and run `SHOW FULL PROCESSLIST`. Here's an SO answer that helped me: http://stackoverflow.com/a/15252722/1266558 – dvdchr Feb 18 '16 at 18:13

1 Answers1

0

For me, the web server was timing out because there were long running queries all over the database, which hogs the available connections and makes Puma to wait for a new connection to be available.

As a first-aid, I restarted my MySQL server and it instantly works. I regret that I didn't log slow queries; because that query must be a result of some bad code in my Rails app.

Additionally, this SO answer also helps: Getting “Lock wait timeout exceeded; try restarting transaction” even though I'm not using a transaction

Community
  • 1
  • 1
dvdchr
  • 734
  • 11
  • 26