What could be causing seemingly random AWS EC2 server to Crash? (Error couldn't establish database connection)

Question

To begin, I am running a Wordpress site on an AWS EC2 Ubuntu Micro instance. I have already confirmed that this is NOT an error with Wordpress/mysql.

Seemingly at random the site will go down and I'll get the "Error establishing database connection" message. The server says that it is running just fine, and rebooting usually fixes the issue, however I'd like to figure out the cause and resolve the issue so this can stop happening (it's been the past 2 weeks now that it goes down almost every other day.)

It's not a spike in traffic, or at least Google Analytics hasn't shown the site as having any spikes in traffic (it averages about 300 visits per day.)

What's the cause, and how can this be fixed?

cjg · Answer 1 · 2014-03-08T07:46:32.803

1

Sounds like you might be running into the throttling that is a limitation on t1.micro. If you use too much CPU cycles you will be throttled.

See http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts_micro_instances.html#available-cpu-resources-during-spikes

edited Mar 08 '14 at 07:46

answered Mar 08 '14 at 07:21

cjg

2,727
1
19
23

I looked into this and it's actually a little odd and I'm not sure if this could be it or not. The site usually sits around 10-20% CPU usage, with occasional spikes to 40-60% and once or twice we hit 100%. The cut-offs though (going down to 1% usage) always happened after sitting around 40-50% usage. Oddly enough, the two times we hit 100% usage the server did not appear to be throttled. Could it be that micro instances are not allowed to use over 60% of the CPU? I guess this requires more Googling. Thank you. :) – cw02 Mar 08 '14 at 15:48
1

I should also add, the instance is always reported as "healthy." Would it do this while being throttled? – cw02 Mar 08 '14 at 15:57

score 0 · Answer 2 · edited May 23 '17 at 12:30

0

The next time this happens I would check some general stats on the health of the instance. You can get a feel for the high-level health of the instance using the 'top' command (http://linuxaria.com/howto/understanding-the-top-command-on-linux?lang=en). Be sure to look for CPU and memory usage. You may find a process (pid) that is consuming a lot of resources and starving your app.

More likely, something within your application (how did you come to the conclusion that this is not a Wordpress/MySQL issue?) is going out of control. Possibly there is a database connection not being released? To see what your app is doing, find the process id (pid) for your app:

ps aux | grep "php"

and get a thread dump for that process: kill -3 to get java thread dump. This will help you see where your application's threads are stuck (if they are).

Typically it's good practice to execute two thread dumps a few seconds apart and compare trends in both. If there is an issue in the application, you should see a lot of threads stuck at the same point.

You might also want to checkout what MySQL is seeing (https://dev.mysql.com/doc/refman/5.1/en/show-processlist.html).

mysql> SHOW FULL PROCESSLIST

Hope this helps, let us know what you find!

edited May 23 '17 at 12:30

Community

1
1

answered Mar 08 '14 at 01:02

pherris

17,195
8
42
58

First off, thank you so much for your quick reply! I'm going to check the things you suggested. I did want to reply really quick just to clarify the Wordpress thing - there's a lot that can go wrong and I guess I should have phrased it better. Many of the responses to people that came up when I was Googling the issue said to check the wp-config.php file. I've done this and it's just fine. Whenever the error happens rebooting the instance always fixes it, for a while. The site was working just fine for its first two months, then this all of a sudden. – cw02 Mar 08 '14 at 11:43
Okay, time for a proper response. According to Cloudwatch, everything is always 'healthy' (even when the site is down.) I don't have access to memory usage, as I have struggled to try and get that added onto Cloudwatch. The CPU usage usually sits around 10-20%, with occasional spikes up into 40-60%. Very rarely will it go all the way up to 100%. Usually the downtime occurs after a spike up to 40-60% and the only way *I* *know* to resolve the issue temporarily is to reboot the instance (and then it is fine for another day then it happens all over again.) – cw02 Mar 08 '14 at 16:20
So for the thread dump... I don't know how to read this. :\ Also, should it be run during a crash or can it be run at any time? I guess most of these questions also apply to MySQL. Although, I also don't now how to check the MySQL process list. ALSO- I was finally able to get the memory data to report in CloudWatch, however I had to create a new instance. So on that end I suppose it is a wait-and-see. – cw02 Mar 08 '14 at 17:55
1

@cw02 Did you ever solve this issue? I'm going through the exact same thing in 2019. – Dante Cullari Feb 26 '19 at 05:45
1

@DanteCullari Unfortunately, no I didn't. I wound up switching entirely to Digital Ocean since they allow you the same amount of control as AWS but with a much better platform and tech support. – cw02 Feb 26 '19 at 11:39
I've had to increase the instance to a t3xlarge. I think I was getting limited because I just installed the wordpress so network capacity was irregular. I may be able to downgrade again later and not get limited by aws. – Dante Cullari Feb 28 '19 at 08:40

What could be causing seemingly random AWS EC2 server to Crash? (Error couldn't establish database connection)

2 Answers2