AWS - Elasticache - Sessions - Irregular persistence patterns

Question

We are migrating from linode to aws and are seeing an issue where sessions are not persisting as expected.

On linode we had an instance which was running Twitter's twemproxy, which would handle the addition, and removal of nodes to the cache pool, and sessions persist as expected. If a node is removed (outage) then of course a session would be regenerated, otherwise, twemproxy will know how to retrieve the session data from online and registered cache nodes.

However on AWS we have an Elasticache instance setup as follows:

3 nodes total
2 nodes in AZ1
1 node in AZ2
Open security group (in a private subnet with no outside access)

We setup DNS to point 'vcache' to the Elasticache endpoint, this way we don't have to manage any cache hostnames at the app layer. The app only knows to connect to vcache:22122

For some reason, session ids are being irregularly regenerated, and we are not sure why. As the Elasticache is a 'black box' interface, we are assuming that it will know how to access a key from a particular cache node, like with twemproxy, but this doesn't seem to be the case.

Did we miss something in the setup, or are we incorrectly assuming that Elasticache is a proxy that knows how to access data, even across AZ nodes?

score 1 · Answer 1 · answered Nov 02 '15 at 21:04

AWS Elasticache will not know to make the redirect to the right node, it is not an actual cluster, instead this should be done by the client. Elasticache for memcached will provide to you an auto-discovery endpoint and that's all your client needs to know. Using that endpoint your client library will find out the details about the cluster: the nodes and their endpoints. But you actually need to use a client library provided by AWS in order to use the auto discovery automatically.
The advantage of using Elasticache is that it will actually replace the failing nodes automatically so you don't have to worry about that. And of course all the IT stuff, patches, upgrades etc.

Ok this makes sense. It does appear to be an app layer issue in that my code is not attempting to retry a get/set. I will check this out, thanks. — Mike Purcell, Nov 02 '15 at 21:23

score 0 · Accepted Answer · edited May 23 '17 at 12:07

The issue ended up being related to this post: Twitter - twemproxy - memcached - Retry not working as expected

Basically I configured twemproxy to consider two servers; vcache-1 and vcache-2, but vcache-2 wasn't initialized yet, so it was constantly being ejected as expected. The issue however was that the request that twemproxy re-introduced vcache-2 back into the pool would break, even after retry attempts. It seems that even with the app issuing retries to set data in cache twemproxy wasn't able to eject and move onto a still working node.

AWS - Elasticache - Sessions - Irregular persistence patterns

2 Answers2