How to store a user entity with Cassandra ("PK decision unique id vs. email")

Question

this is my first post since I am convinced there is a better solution than mine. My question is rather a design question.

I use Spring Boot 2.1.x to store a user entity in a Cassandra database. This works well so far. It is stored with

java generated uuid
mail address
salted bcrypt password
some user defined types...

Well, in case somebody uses the login I will get the mail and the password to get the credentials.

For retrieving the user object I would expect some WHERE-clause with "select * from user where username = mail".

However, in this case mail must be the partition key in Cassandra. But, I want the user to be able to change her/his mail address and then it may not be part of the primary key of a Cassandra table.

My naive idea is to have an inverse table with a tuple (mail, java generated uuid) to lookup the user and then to load the user with the uuid.

I am just learning about handling Cassandra properly but in IMHO my design is crap.

This is what I have in my user bean.

@PrimaryKeyColumn(type = PrimaryKeyType.PARTITIONED, ordinal = 0, name = "id")
@JsonProperty("id")
private String id;

@PrimaryKeyColumn(type = PrimaryKeyType.CLUSTERED, ordinal = 1, name = "email")
@Email(message = "*Please provide a valid email")
@NotEmpty(message = "*Please provide an email")
@JsonProperty("email")
private String email;

Have you considered mail as a normal column and use a secondary index on it? — Horia, Jan 28 '19 at 16:18
Dear Horia, many thanks for your feedback. This seems to work. I was not aware and use now SASI index on the email field. - many thanks. — msek, Jan 28 '19 at 17:00
Secondary indexes are different from SASI. SASI are currently marked as experimental and have some issues. Is not advisable to use it in production. Regarding secondary indexes, there are some drawbacks also: a query that would contain secondary indexes will be executed across many nodes, since each node will have its own index for the data that it owns. Also, there are some specific use cases that secondary indexes should not be used on: columns with high cardinality, columns with very low cardinality, columns that are frequently updated or deleted. — Horia, Jan 28 '19 at 17:30
I would read some more on this matter - [Cassandra at Scale: The Problem with Secondary Indexes](https://pantheon.io/blog/cassandra-scale-problem-secondary-indexes), [Cassandra Native Secondary Index Deep Dive](https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive) — Horia, Jan 28 '19 at 17:31
And some reading regarding SASI - [Cassandra SASI Index Technical Deep Dive](http://www.doanduyhai.com/blog/?p=2058) — Horia, Jan 28 '19 at 17:32
Dear Horia, many thanks for your feedback again. Of course, the mail column will have a high cardinality but I did not expect that my questions is so extraordinary. I thought the problem is quite simple. — msek, Jan 29 '19 at 08:03
BTW the article at pantheon.io covers my question as a use case. — msek, Jan 29 '19 at 08:12
Dealing with the same question - https://stackoverflow.com/questions/25124993/how-to-avoid-secondary-indexes-in-cassandra .. and concluding with the same proposal I did. :/ — msek, Jan 29 '19 at 08:36

score 0 · Answer 1 · answered Feb 12 '19 at 17:08

0

I just want to mention that this topic basically deals with the justification of materialized views in Cassandra. I tried to solve it before with a custom aspect with annotations but in future I will use materialized views.

answered Feb 12 '19 at 17:08

msek

181
1
5

How to store a user entity with Cassandra ("PK decision unique id vs. email")

1 Answers1