9

I am trying to build a highly available very high volume shopping cart application. The application will have a volume so high that I am considering using cassandra instead of mysql for the database.

Now, in a shopping cart system, most database actions have to be 100% consistent, while others do not have to be.

Example of 100% consistent action: Saving the payment confirmation. Saving the purchased items list.

Example of things which do not require 100% consistent action: Saving the address of the customer (If at the time of payment, no address is saved in the database, assume that it was lost and ask the customer again). Other similar things.

Now, if I am running a server cluster in the same region (Amazon EC2), are there any major roadblocks to performing all transactions as a maximal consistent transaction. Would that provide identical reliability than mySQl Relational database. Remember, we are dealing with financial transactions here.

Is my data generally "safe" in cassandra. By that I mean complete unexpected power failure, random disc failure, etc, etc.

Anon21
  • 2,961
  • 6
  • 37
  • 46

3 Answers3

10

Specific to your questions about availability and EC2 ... As Theodore wrote, the consistency level in Cassandra will dictate how "safe" the data is. The problems you'll face is how to ensure the data is getting to Cassandra, fulfilling your Transaction goals and is being saved appropriately.

There are some good threads about transactions and solving this problem on the Apache Cassandra User's mailing list.

Cassandra on it's own is not suitable for transactions:

To get around this, you need "something" that can leverage Cassandra as a data store that manages the transactions above the data tier.

Summary ... You cannot guarantee financial transactions with Cassandra alone

Community
  • 1
  • 1
sdolgy
  • 6,963
  • 3
  • 41
  • 61
3

There are lots of different ways to define consistency. If by "maximal consistent transaction", you mean reading and writing at ConsistencyLevel ALL, then that will provide consistency in sense that your reads will never return an out-of-date value, and durability in the sense that your writes will be stored on all nodes before returning.

That's not the same as transactions, however. Cassandra does not support transactions. It doesn't provide consistency between different rows, as MySQL does. For example, suppose you add an item to the shopping basket, and update the total cost in the cart. Individually, each operation will be stored consistently and durably. However, there may be a window of time in which you can see one change but not the other. In a relational database, you can group them into a transaction so that you can only see both, or neither.

As far as safety goes, Cassandra stores all your writes to disk in a commit log before it does anything else, in the same way that relational databases use transaction logs. So it is just as safe with regard to system crashes. With regards to node failures, if you write at CL.ALL, then you will never lose data as long as one node in each replica set survives. With regard to disk failure, that is a matter for your underlying hardware setup, e.g. RAID.

Theodore Hong
  • 1,747
  • 12
  • 11
  • Thanks for providing a lot of clear information. To remedy the lack of transactions can the following be done: I create a cassandra table called "locks". When I want to perform a transaction on the database, I add the uuid representing the rows I will write at to the locks table. After I am done writing the rows of interest with QUORUM, I delete the uuids from the locks table. If an independent query is tried in the mean time, it will first check to see if the relevant uuids are present in the locks tables, and will be disallowed if they are. Will that permit transactions? Very slow ones only? – Anon21 Oct 21 '11 at 16:50
  • Unfortunately this is still prone to a race condition in the following way: 1. Client A checks the lock table and finds nothing. 2. Client A reads the shopping basket. 3. Client B writes to the lock table. 4. Client B updates the shopping basket and the total cost. 5. Client B clears the locks. 6. Client A reads the total cost, which is now inconsistent with the shopping basket read earlier. This problem cannot be solved without using stronger distributed protocols such as that provided by Zookeeper (mentioned by @sdolgy) – Theodore Hong Oct 24 '11 at 15:24
  • I think you skipped one of the steps. The initial steps would instead be: 1. Client A checks the lock table and finds nothing. 2. Client A inserts a lock on each item it intends to read or write to. 3. Client A reads the shopping basket. 3. Client B checks the lock table and finds a lock on the items it is interested in therefore it stops. 4. Client A finishes his operations. – Anon21 Oct 24 '11 at 15:56
-1

As of 2022 Cassandra supports transactions.

Find out how BestBuy are using it: https://www.slideshare.net/joelcrabb/cassandra-and-riak-at-bestbuycom