1

I've never used CouchDB/MongoDB/Couchbase before and am evaluating them for my application. Generally speaking, they seem to be a very interesting technology that I would like to use. However, coming from an RDBMS background, I am hung up on the lack of transactions. But at the same time, I know that there is going to be much less a need for transactions as I would have in an RDBMS given the way data is organized.

That being said, I have the following requirement and not sure if/how I can use a NoSQL DB.

  1. I have a list of clients
  2. Each client can have multiple files
  3. Each file must be sequentially number for that specific client

Given an RDBMS this would be fairly simple. One table for client, one (or more) for files. In the client table, keep a counter of last filenumber, and increment by one when inserting a new record into the file table. Wrap everything in a transaction and you are assured that there are inconsistencies. Heck, just to be safe, I could even put a unique constraint on a (clientId, filenumber) index to ensure that there is never the same filenumber used twice for a client.

How can I accomplish something similar in MongoDB or CouchDB/base? Is it even feasible? I keep reading about two-phase commits, but I can't seem to wrap my head around how that works in this kind of instance. Is there anything in Spring/Java that provides two-phase commit that would work with these DBs, or does it need to be custom code?

Eric B.
  • 23,425
  • 50
  • 169
  • 316

2 Answers2

3

Couchdb is transactional by default. Every document in couchdb contains a _rev key. All updates to a document are performed against this _rev key:-

  1. Get the document.
  2. Send it for update using the _rev property.
  3. If update succeeds then you have updated the latest _rev of the document
  4. If the update fails the document was not recent. Repeat steps 1-3.

Check out this answer by MrKurt for a more detailed explanation.

The couchdb recipies has a banking example that show how transactions are done in couchdb.

And there is also this atomic bank transfers article that illustrate transactions in couchdb.

Anyway the common theme in all of these links is that if you follow the couchdb pattern of updating against a _rev you can't have an inconsistent state in your database.

Heck, just to be safe, I could even put a unique constraint on a (clientId, filenumber) index to ensure that there is never the same filenumber used twice for a client.

All couchdb documents are unique since the _id fields in two documents can't be the same. Check out the view cookbook

This is an easy one: within a CouchDB database, each document must have a unique _id field. If you require unique values in a database, just assign them to a document’s _id field and CouchDB will enforce uniqueness for you.

There’s one caveat, though: in the distributed case, when you are running more than one CouchDB node that accepts write requests, uniqueness can be guaranteed only per node or outside of CouchDB. CouchDB will allow two identical IDs to be written to two different nodes. On replication, CouchDB will detect a conflict and flag the document accordingly.

Edit based on comment

In a case where you want to increment a field in one document based on the successful insert of another document

You could use separate documents in this case. You insert a document, wait for the success response. Then add another document like

{_id:'some_id','count':1}

With this you can set up a map reduce view that simply counts the results of these documents and you have an update counter. All you are doing is instead of updating a single document for updates you are inserting a new document to reflect a successful insert.

I always end up with the case where a failed file insert would leave the DB in an inconsistent state especially with another client successfully inserting a file at the same time.

Okay so I already described how you can do updates over separate documents but even when updating a single document you can avoid inconsistency if you :

  1. Insert a new file
  2. When couchdb gives a success message -> attempt to update the counter.

Why this works?

This works because because when you try to update the update document you must supply a _rev string. You can think of _rev as a local state for your document. Consider this scenario:-

  1. You read the document that is to be updated.
  2. You change some fields.
  3. Meanwhile another request has already changed the original document. This means the document now has a new _rev
  4. But You request couchdb to update the document with a _rev that is stale that you read in step #1.
  5. Couchdb will generate an exception.
  6. You read the document again get the latest _rev and attempt to update it.

So if you do this you will always have to update against the latest revision of the document. I hope this makes things a bit clearer.

Note:

As pointed out by Daniel the _rev rules don't apply to bulk updates.

Community
  • 1
  • 1
Akshat Jiwan Sharma
  • 15,430
  • 13
  • 50
  • 60
  • 2
    "Couchdb is transactional by default." - just clarifying...That is per document. Not spanning multiple documents. You can have bulk updates being treated as "one" unit in case a validation handler is invalid, but it's not a transaction. – Daniel Sep 09 '14 at 17:35
  • I've read the atomic bank transfers articles before and although using the but they don't handle a case where you need to retain an accurate count. In a case where you want to increment a field in one document based on the successful insert of another document, I am unable to see how the bank transfer example would work. No matter how I try to structure it, I always end up with the case where a failed file insert would leave the DB in an inconsistent state - especially with another client successfully inserting a file at the same time. – Eric B. Sep 10 '14 at 02:19
2

Yes you can do the same with MongoDB, and Couchbase/CouchDB using proper approach.

First of all in MongoDB you have unique index, this will help you to ensure a part of the problem: - http://docs.mongodb.org/manual/tutorial/create-a-unique-index/

You also have some pattern to implement sequence properly: - http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/

You have many options to implement a cross document/collection transactions, you can find some good information about this on this blog post: http://edgystuff.tumblr.com/post/93523827905/how-to-implement-robust-and-scalable-transactions (the 2 phase commit is documented in detail here: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/ )

Since you are talking about Couchbase, you can find some pattern here too: http://docs.couchbase.com/couchbase-devguide-2.5/#providing-transactional-logic

Tug Grall
  • 3,410
  • 1
  • 14
  • 16
  • I've already read about two-phase commits but I still don't see how it would apply here. If I have a single thread, I can see it working, but if I have multiple threads I have no guarantee on atomicity - that is thread #2 can be fully executed before thread #1 is finished, and if the first fails, but the second succeeds, the file number/counter will be incorrect. Unless I am misunderstanding how to properly do a two-phase commit (I don't think so). – Eric B. Sep 10 '14 at 02:27