I've been running two identical medium CPU instances on Amazon behind a load balancer for a few months. I've noticed the load balancer has a habit of declaring an instance unhealthy on a fairly regular basis, taking the instance down and replacing with a new instance of the defined AMI.
That's technically the correct thing to do, I just don't understand why it thinks the instance is unhealthy, occasionally. I've been monitoring the health check ports over the last 3 days and the check every 60 seconds constantly works when using the public DNS of the two instances. The load balancer has declared an instance unhealthy 3 times over that period and replaced it. The instances are massively overpowered for what I need, purposefully, so I can rule that out from being an issue.
With the ELB architecture, I know this doesn't technically matter, but the rate of unhealthies has gone from one per week to over one per day. Each instance spun up costs me an extra hour of instance cost. If this gets worse, the cost will become non-trivial, but more importantly it doesn't give me faith in the ELB internals.
This isn't the same question as this one, mine is an occasional failure. For information, I'm using the EU/Ireland data center and my unhealthy criterion is 10 failures on my port (8080) over a 5 minute period (which is longer than I'd really like to set anyway, I don't want traffic going to the instances failing to get a response for 5 minutes).
I know someone is going to suggest contacting Amazon, but I don't have a support contract and anyone who's tried this knows the kind of answer I'll get, if I get one at all. I really like the idea of this thing, it just doesn't seem that stable to me.