How to create auto increment IDs in Cassandra

Question

We know that it is easy to create auto increment IDs in SQL databases, is there a good solution for it in Cassandra? The IDs should be for key or column name.

Related: [Cassandra: Generate a unique ID](http://stackoverflow.com/q/16084573/2390083) — Beryllium, Aug 19 '13 at 21:25
The moment you ask yourself, "how do I replicate X functionality from XYZsql database" you are almost guaranteed to be using cassandra wrong. Back up and examine your requirements again. — Michael, Aug 30 '14 at 22:05

score 42 · Answer 1 · edited Nov 16 '19 at 16:56

42

How about the following, using Cassandra's Lightweight transactions

1 - Create IDs table:

CREATE TABLE ids (
  id_name varchar,
  next_id int,
  PRIMARY KEY (id_name)
)

2 - Insert every id you'd like to use a global sequence with

For example:

INSERT INTO ids (id_name, next_id)
VALUES ('person_id', 1)

3 - Then, when inserting to a table where you'd like to use an auto-incremented key, do the following:

3.1 - Get the next_id from the ids table:

SELECT next_id FROM ids WHERE id_name = 'person_id'

Let's say the result is next_id = 1

3.2 - Increment next_id, the following way:

UPDATE ids SET next_id = 2 WHERE id_name = 'person_id' IF next_id = 1

The result should look like this:

[{[applied]: True}]

If it was updated successfully, OR

[{[applied]: False, next_id: 2}]

If someone else has already updated it.

So, if you got True, use id '1' - it is yours. Otherwise, increment next_id (or just use the returned next_id) and repeat the process.

edited Nov 16 '19 at 16:56

snovelli

5,804
2
37
50

answered Apr 01 '15 at 13:16

AlonL

6,100
3
33
32

How would you account for replication if you're using a cluster? – Panama Jack Apr 07 '18 at 01:32
If we put this in a separate spring boot service, will this work with multiple instances? – Saurabh Dec 19 '18 at 21:34
1

this is good solution however keep in mind that if you have many concurrent threads generating the ID, you might have lots of "retries" and it will eat lots of connections from the pool. So be careful with this approach. – walv Dec 05 '19 at 21:33
4

This pattern is called `Optimistic Locking` – Maxim Jun 02 '20 at 10:55
1

i would use counters – Kat Lim Ruiz Sep 17 '20 at 20:09
Regarding the two parts in: "Otherwise, increment next_id (or just use the returned next_id) and repeat the process.": "Repeat the process" refers to the next time one wants to get another id ( (different from the current transaction/flow, in which the id would already have been satisfactorily obtained by the first part increment next_id (or just use the returned next_id)? – PCH Mar 28 '23 at 17:34
@PCH No, if you got "False", meaning it did not apply - you were not able to obtain the next ID, so you should increment the number and try again. – AlonL Apr 12 '23 at 05:05

score 42 · Answer 2 · edited Feb 09 '18 at 19:06

42

Creating a global sequential sequence of number does not really make any sense in a distributed system. Use UUIDs.

(Because you would have to make all participants agree and accept the evolution of the sequence -- under a naive implementation)

edited Feb 09 '18 at 19:06

rogerdpack

62,887
36
269
388

answered Oct 21 '10 at 11:40

Luis Matta

445
3
2

3

Cassandra supports counters (https://cassandra.apache.org/doc/cql3/CQL.html#counters), there are some limitations though. – Piyush Kansal Jan 17 '15 at 00:31
1

Short preview: primary limitation is that all other columns in your Cassandra table need to be part of the primary key. – Nishant Kelkar Feb 08 '16 at 22:22
https://cassandra.apache.org/doc/cql3/CQL.html#counters This link is dead – Harish Jan 15 '17 at 00:37
1

I totally agree with the fact that you should use UUIDs, however if you replace some legacy DB with Cassandra and you need to maintain numbers as IDs because numbers are used everywhere across the app and also are part of your API, then UUIDs won't work – walv Dec 05 '19 at 21:36
I also agree with your statement, but it doesn't help with the OP's question... and there are some good reasons that you might need an incrementing counter, even in a distributed DB. I just had a situation where a PO didn't like seeing UUIDs on the UI and asked for a "prettier" unique identifier, so we introduced an incrementing counter (which isn't the PK, but satisfies the requirement). Introducing new persistence services was *not* an option at this point. It may happen to you!!! – Tom Jun 29 '20 at 20:38
Using Auto incremented IDs to display to the user as a sequenced number is not good idea, And it should be done manually, Because sometime gap happened between auto incremented IDs and you can't explain these gaps to the user. – adramazany May 05 '23 at 09:25

score 16 · Answer 3 · answered Oct 21 '10 at 10:19

There is no good solution.

Create a column with a number, increase the number and save it to all replicas together with a temporary id, read all replicas and check if the temporary id is "yours", if not do it again.. not a great solution and will not scale.

or

Build your own id service where you fetch your next id. This service will only be run in a single instance and will be a non scaling scary factor.

As soon as anything goes beyond a single instance the sequencing of id's gets complicated, at least if you want it to scale. That includes relational databases.

By "id service" I presume you mean "outside of Cassandra" is that right? — rogerdpack, Feb 08 '18 at 23:20

score 6 · Answer 4 · answered Jun 23 '15 at 11:43

there is a counter datatype which can be used. Consider the below example.

CREATE KEYSPACE counterks WITH REPLICATION =
{ 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 };

Create a table for the counter column.

CREATE TABLE counterks.page_view_counts
(counter_value counter,
url_name varchar,
page_name varchar,
PRIMARY KEY (url_name, page_name)
);

Load data into the counter column.

UPDATE counterks.page_view_counts
SET counter_value = counter_value + 1
WHERE url_name='www.datastax.com' AND page_name='home';

Take a look at the counter value.

SELECT * FROM counterks.page_view_counts;

Output is:

 url_name         | page_name | counter_value
------------------+-----------+---------------
 www.datastax.com |      home |             1

Increase the value of the counter.

 UPDATE counterks.page_view_counts
 SET counter_value = counter_value + 2
 WHERE url_name='www.datastax.com' AND page_name='home';

Take a look at the counter value.

 url_name         | page_name | counter_value
------------------+-----------+---------------
www.datastax.com |      home |             3

Refer this for more details: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html

Unfortunately there doesn't appear to be a "get and increment" option for counters [?] so it's subject to race conditions, see the comments to AlonL's answer. — rogerdpack, Feb 08 '18 at 23:20

Basav · Answer 5 · 2018-06-05T07:24:22.677

I think IMHO expecting Cassandra to provide an auto-incrementing field is WRONG

Cassandra is a elegant decentralised database, hence expecting it to provide a auto-incrmenting field is, taxing and defeats the original purpose, because this value has to be then maintained in a central place

Hence, don't make any solution which DB based to get a auto-incrementing number

Instead generate the ID in the code or service in your app, which can keep generating random unique IDs and use that to apply on your data model, this way the objective & benefit of Cassandra will not be defeated

score 3 · Answer 6 · answered May 11 '13 at 17:43

This question is pretty old but I'd like to complete it with an other solution.

Any solution that relies on nodes synchronization is unreasonable. It's pretty sure to break either by blocking IDs generation or by creating duplicate IDs.

MySQL way

You can reproduce the way it's done with the mysql master-master replication with the auto_increment_increment and auto_increment_offset parameters.

To reproduce it, you need to know the number of nodes or the max number of expected nodes and you need to create a (non-cassandra) counter (a file per example) on each node.

Each time you want to generate a new number, you find the current value, add the increment and save it. If it doesn't exist yet, it's the offset.

So for 10 nodes, you would have an increment of 10 and an offset of 1 for the first node, 2 for the second node, etc. Node 1 would create the IDs 1, 11, 21. Node 2 would create the IDs 2, 21, 22.

If you want your IDs to be (approximatively) ordered between nodes, you need to maintain a shared counter and make sure each generated ID is higher than the shared counter. That way, unless your nodes/datacenters are out of sync for a long time, you shouldn't notice much difference.

Prefixing

You can do basically the same thing by prefixing the ID (if it's an acceptable solution) with the node number (or name). And you don't have to known the number of nodes. Node 1 would create 1_1, 1_2, 1_3. Node 2 would create 2_1, 2_2, 2_3.

AlonL · Answer 7 · 2015-03-31T11:21:30.673

2

Edit: This solution is not proper. See the first comment.

My solution:

1 - Create IDs table:

CREATE TABLE ids (
  id_name varchar,
  next_id counter,
  PRIMARY KEY (id_name)
)

2 - When inserting to a table where you'd like to use an auto-incremented key, do the following:

2.1 - Increment counter (it will be created if not exists), using the highest consistency level

UPDATE ids
  SET next_id = next_id + 1
  WHERE id_name = $AUTO_INCREMENTED_ID
  USING CONSISTENCY ALL

2.2 - Get the new id value:

SELECT next_id
  FROM ids
  WHERE id_name = $AUTO_INCREMENTED_ID

2.3 - Insert the value with the auto-incremented id

INSERT INTO some_table ($AUTO_INCREMENTED_ID, ...)
  VALUES ($RESULT_FROM_PREVIOUS_QUERY, ...)

Words starting with '$' in my answer are self-explanatory (I hope) placeholders...

Of course this is not a recommended method. Use it only if you have to.

edited Mar 31 '15 at 11:21

answered Mar 30 '15 at 15:57

AlonL

6,100
3
33
32

3

This will not work: 2 requests might increment the same counter simultaneously on step 2.1. As a result on step 2.2 they will get the same next_id. This is rare situation but possible. – Stanislav Berkov Mar 31 '15 at 10:44
You're absolutely right, thanks. I've added a comment to the answer, but left it because maybe someone can still find it useful. – AlonL Mar 31 '15 at 11:22
@StasBerkov please see my new answer. I'd be happy to hear your opinion. – AlonL Apr 01 '15 at 13:17
I like your proposal. – Stanislav Berkov Apr 01 '15 at 14:24

score 1 · Answer 8 · answered Feb 18 '15 at 18:08

Do they really need to be sequential, or do you just need counting numbers that are much smaller than a UUID that are easily entered by a person?

If you really need sequential numbers, then you will need to do one of the following.

Have a table in cassandra where the key/id is a field for the generator, and the value is a number... do conditional updates in a loop until you successfully increment the count. (bad idea)
Have a generator service that will give you the next number. This can only run on a single system and be a single point of failure, but depending on your needs this might be best.

Alternatively... Similar to the first, but get batches of 100 or so numbers at a time, and deal those out inside your process/thread... This will have less contention, but no guarantee of sequential order, only uniqueness... If you only want shorter numbers that are unique for display, this may be your best bet.

How to create auto increment IDs in Cassandra

8 Answers8

1 - Create IDs table:

2 - Insert every id you'd like to use a global sequence with

3 - Then, when inserting to a table where you'd like to use an auto-incremented key, do the following:

3.1 - Get the next_id from the ids table:

3.2 - Increment next_id, the following way:

MySQL way

Prefixing

Linked