2

Without relying on the database, is there a way to ensure a field (let's say a User's emailAddress) is unique.

Some common failure attempts:

  1. Check first if emailAddress exists (by querying the DB) and if not then create the user. Now obviously in the window of check-then-act some other thread can create a user with same email. Hence this solution is no good.
  2. Apply a language-level lock on the method responsible for creating the user. This solution fails as we need redundancy of the service for performance reasons and lock is on a single JVM.
  3. Use an Event store (like an Akka actor's mailbox), event being an AddUser message, but since the actor behavior is asynchronous, the requestor(sender) can't be notified that user creation with unique email was successful. Moreover, how do 2 requests (with same email) know they contain a unique email? This may get complicated.

Database, being a single source of data that every thread and every service instance will write to, makes sense to implement the unique constraint here. But this holds true for Relational databases. Then what about NoSql databases? some do allow for a unique constraint, but it's not their native behavior, or maybe it is.

But the question of not using the database to implement uniqueness of a field, what could be the options?

bitscanbyte
  • 650
  • 8
  • 14
  • " in the window of check-then-act some other thread can create a user with same email." <- You could just synchronize this part and make sure that while one thread is checking if the email exists and creates a new user, no other thread can do the same. – OH GOD SPIDERS Aug 20 '20 at 09:17
  • Synchronization is language (jvm-level) lock and doesn't hold good if multiple instances of service are running. Read point#2 – bitscanbyte Aug 20 '20 at 09:26

3 Answers3

0

I think your question is more generic - "how do I ensure a database write action succeeded, and how do I handle cases where it didn't?". Uniqueness is just one failure mode - you may be attempting to insert a value that's too big, or of the wrong data type, or that doesn't match a foreign key constraint.

Relational databases solve this through being ACID-compliant, and throwing errors for the client to deal with when a transaction fails.

You want (some of) the benefits of ACID without the relational database. That's a fairly big topic of conversation. The obvious way to solve this is to introduce the concept of "transaction" in your application layer. For instance, in your case, you might send a "create account(emailAddress, name, ...)" message, and have the application listen for either an "accountCreated" or "accountCreationFailed" response. The recipient of that message is responsible for writing to the database; you have a couple of options. One is to lock that thread (so only one process can write to the database at any time); that's not super scalable. The other mechanism I've used is introducing status flags - you write the account data to the database with a "draft" flag, then check for your constraints (including uniqueness), and set the "draft" flag to "validated" if the constraints are met (i.e. there is no other record with the same email address), and "failed" if they are not.

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • ...`you might send a "create account(emailAddress` if you're achieving this through `language-level lock` then you're restricting to no-redundancy (and potentially **SPOF**) If you're trying to achieve solution through `draft` flag then it's basically a `fire-and-forget` scenario and requestor will always see request succeeded, moreover you're only achieving asynchronous behavior, **not** _not-using_ through database constraints – bitscanbyte Aug 20 '20 at 09:44
0

to check for uniquness you need to store the "state" of the program. for safety you need to be able to apply changes to the state transactionally.

  • you can use database transactions. a few of the NoSQL databases support transactions too, for example, redis and MongoDB. you have to check for each vendor separately to see how they support transactions. in this setup, each client will connect to the database and it will handle all of the details for you. also depending on your use case you should be careful about the isolation level configuration.
  • if durability is not a concern then you can use in memory databases that support transactions.

which state store you choose, it should support transactions. there are several ways to implement transactions and achieve consistency. many relational databases like PostgresSQL achieve this by implementing the MVCC algorithm. in a distributed environment you have to look for distributed transactions such as 2PC, Paxos, etc.

normally everybody relies on availabe datastore solutions unless there is a weird or specific requirement for the project.

final note, the communication pattern is not related to the underlying problem here. for example, in the Actor case you mentioned, at the end of the day, each actor has to query the state to find if a email exists or not. if your state store supports Serializability then there is no problem and conflicts will not happen (communicating the error to the client is another issue). suppose that you are using PostgreSQL. when a insert/update query is issued, it is wrapped around a transaction and the underlying MVCC algorithm will take care of everything. in an advanced and distrbiuted environment you can use data stores that support distributed transactions, like CockroachDB.

if you want to dive deep you can research these keywords: ACID, isolation levels, atomicity, serializability, CAP theorem, 2PC, MVCC, distributed transacitons, distributed locks, ...

Aidin
  • 64
  • 10
  • Database transactions isn't in question here, we are looking for a solution that doesn't involve relying on database constraints/transactions. Relational databases natively support ACID and hence very easy to aid unique constraint via databases. – bitscanbyte Aug 20 '20 at 13:05
  • @bitscanbyte I was not talking about databases, I was talking about a state store. your problem requires storing state. If you don't want to you use available data stores, then you have to implement a part of their functionality in your code. you can also use an embeddable database library like https://github.com/jankotek/mapdb or https://github.com/hazelcast/hazelcast (in embedded mode). hazlecast also supports distributed mode, for example five instances of your application sharing the same state. – Aidin Aug 20 '20 at 14:24
0

NoSQL databases provide different, weaker, guarantees than relational databases. Generally, the tradeoff is you give up ACID guarantees in exchange for increased scalability in the dimensions that matter for your application.

It's possible to provide some kind of uniqueness guarantee, but subject to certain tradeoffs. With NoSQL, there are always tradeoffs.

If your NoSQL store supports optimistic concurrency control, maybe this approach will work:

Store a separate document that contains the set of all emailAddress values, across all documents in your NoSQL table. This is one instance of this document at a given time.

Each time you want to save a document containing emailAddress, first confirm email address uniqueness:

Perform the following actions, protected by optimistic locking. You can on the backend if this due to a concurrent update:

  1. Read this "all emails" document.
  2. Confirm the email isn't present.
  3. If not present, add the email address to the "all emails document"
  4. Save it.

You've now traded one problem ... the lack of unique constraints, for another ... the inability to synchronise updates across your original document and this new "all emails" document. This may or may not be acceptable, it depends on the guarantees that your application needs to provide.

e.g. Maybe you can accept that an email may be added to "all emails", that saving the related document to your other "table" subsequently fails, and that that email address is now not able to be used. You could clean this up with a batch job somehow. Not sure.

The index of emails could be stored in some other service (e.g. a persistent cache). The same problem exists, you need to keep the index and your document store in sync somehow.

There's no easy solution. For a detailed overview of the relevant concepts, I'd recommend Designing Data-Intensive Applications by Martin Kleppmann.

grantn
  • 185
  • 1
  • 8