11

I need an implementation of an Atomic Counter on cloud to produce a serial integer from concurrent connections. The business behind is a tracking server.

Requirements by priority:

  1. (MUST) Durable - be sure that once a clients gets a number, no other client will ever going to get this same number. no duplicates...
  2. (MUST) Scaleable - current load is 10K/second and 1M/second in future from 200-1000 concurrent client connections. A scaleablity feature of incrementing by 100
  3. (MUST) < +-15ms average (postgres/mysql/redis are great, http latency like DynamoDB is out of the question) this is just to filter out slow solutions
  4. (nice to have) increment by This is a scalability where the client increments by a chunk(e.g. 100) and manages the incrementation in application's memory.
  5. (nice to have) A fare price < 150$ for 5k/s and expecting leaner pricing growth beyond.
  6. (nice to have) HA (High Availability) - I can handle 0.01% failures, but durability is important, I need to have no duplicate numbers.

My alternatives are:

  1. Sequence of postgres CREATE SEQUENCE serial CACHE 100; SELECT nextval(sequence) - 140$/m MultiAZ AWS RDS db.m3.medium not not as fast as redis but I think is < 7ms in average. the "cache" is a strong feature that should boost performance.
  2. Redis INCR with Redis Sentinel/RDS MultiAZ - cache.m3.medium MultiAZ - 120$/m - Durablity is in question.

redis has INCRBY, and postgres only has the "cache" feature of sequence which requires roundtrip to the DB.

Any input? regarding those two alternatives or others?

naviram
  • 1,445
  • 1
  • 15
  • 26
  • Does this need to be in a database? Can't you build your own "counter service" using synchronization primitives present in most programming platforms? – Dai Oct 23 '16 at 01:50
  • It needs to be durable and persisted like a DB. And have TCP Socket like latency. So it is hard to get without a DB, in-memory won't work and high preferment, HA, zero downtime deployment, and managing high concurrency on a file writes without a DB is hard. you build many DB features actually, I rather buy it :) – naviram Oct 23 '16 at 02:04
  • 3
    maybe you should use GUIDs instead? – Dai Oct 23 '16 at 03:06
  • 2
    I don't think a single instance of Redis or Postgres can process 1M/s requests. You need some distributed method to generate unique ids. Take a look at twitter's Snowflake. It might help. – for_stack Oct 23 '16 at 06:37
  • 1
    You didn't specify that you have to store any data. If you just have to maintain the counter, you should really not do that (counting) in a db. As Dai already said, an (actually not very complicated) service is really the best way to go (apart from implementing that at protocol stack). Not only databases understand locking/transactions. They rely on the underlying system to lock. Your service can use that too. To the pricing: Good that it is just a nice to have. For 1M/s, you can easily add one or two 0s. (But cost pressure might lead to other concepts, e.g. clients get a range of 100(0)). – Solarflare Oct 23 '16 at 09:40
  • Good points everyone Dai - GUiD is something that I have consider but it doesn't match my use case. Solarflare - indeed the < 150$ is under the assumption of 1000 id range. I have other scenario of db.m3.medium postgres RDS that holds 1K/s and if will I implement the 1000 range it seems that I can scale easily X1000 :) – naviram Oct 24 '16 at 20:15
  • forslack - thank you for sharing about Snowflake it surely a good alternative for future growth in traffic. – naviram Oct 24 '16 at 21:25

1 Answers1

16

I think you are overestimating the risk of a redis failure that leaves it unable to flush to disk and underestimating the risk of any RDBMS doing the same. The risk can be mitigated in both by syncing writes to disk.

In redis this means switching to AOF (Append Only File) mode, as described in the persistence link you already link to.

There is no need to do any expiring key trickery. The atomic behavior of incr and incrby are more than sufficient to ensure uniqueness and durability, especially when combined with AOF persistence.

Redis is just about perfect for this use case. It is fast enough and scaleable. Redis has been around a while now. There are no legitimate durability concerns that wouldn't also be concerns for PostgreSQL or MySQL.

As noted by @Solarflare having applications grab blocks of IDs at a time is even more cost effective and scaleable. This can be accomplished in redis using incrby.

Carl Zulauf
  • 39,378
  • 2
  • 34
  • 47
  • Carl, this is just the input I personally was looking for, I am going to accept this answer, and also mention for_clack and Solaflare for their contribution. – naviram Oct 26 '16 at 01:13
  • redis seems to be a good candidate also due to its scalability features regarding this scenario and others. – naviram Oct 26 '16 at 01:24
  • 1
    I want also to thank for_slack for mentioning the distributed solutions for unique ids like twitter snowflake and to Solflare for noting that my 100(0) range suggestion is actually a cost effective alternative to the distributed solutions when scaling up. – naviram Oct 26 '16 at 01:24
  • Made a small update to clarify how to do ranges in redis. – Carl Zulauf Oct 31 '16 at 04:31