15

I've read a bit about CouchDB and I'm really intrigued by the fact that it's "append-only". I may be misunderstanding that, but as I understand it, it works a bit like this:

  • data is added at time t0 to the DB telling that a user with ID 1's name is "Cedrik Martin"

  • a query asking "what is the name of the user with ID 1?" returns "Cedrik Martin"

  • at time t1 an update is made to the DB telling: "User with ID 1's name is Cedric Martin" (changing the 'k' to a 'c').

  • a query asking again "what is the name of the user with ID 1" now returns "Cedric Martin"

It's a silly example, but it's because I'd like to understand something fundamental about CouchDB.

Seen that the update has been made using an append at the end of the DB, is it possible to query the DB "as it was at time t0", without doing anything special?

Can I ask CouchDB "What was the name of the user with ID 1 at time t0?" ?

EDIT the first answer is very interesting and so I've got a more precise question: as long as I'm not "compacting" a CouchDB, I can write queries that are somehow "referentially transparent" (i.e. they'll always produce the same result)? For example if I query for "document d at revision r", am I guaranteed to always get the same answer back as long as I'm not compacting the DB?

Cedric Martin
  • 5,945
  • 4
  • 34
  • 66

4 Answers4

34

Perhaps the most common mistake made with CouchDB is to believe it provides a versioning system for your data. It does not.

Compaction removes all non-latest revisions of all documents and replication only replicates the latest revisions of any document. If you need historical versions, you must preserve them in your latest revision using any scheme that seems good to you.

"_rev" is, as noted, an unfortunate name, but no other word has been suggested that is any clearer. "_mvcc" and "_mcvv_token" have been suggested before. The issue with both is that any description of what's going on there will inevitably include the "old versions remain on disk until compaction" which will still imply that it's a user versioning system.

To answer the question "Can I ask CouchDB "What was the name of the user with ID 1 at time t0?" ?", the short answer is "NO". The long answer is "YES, but then later it won't work", which is just another way of saying "NO". :)

Robert Newson
  • 4,631
  • 20
  • 18
5

As already said, it is technically possible and you shouldn't count on it. It isn't only about compaction, it's also about replication, one of CouchDB's biggest strengths. But yes, if you never compact and if you don't replicate, then you will be able to always fetch all previous versions of all documents. I think it will not work with queries, though, they can't work with older versions.

Basically, calling it "rev" was the biggest mistake in CouchDB's design, it should have been called "mvcc_token" or something like that -- it really only implements MVCC, it isn't meant to be used for versioning.

Ladicek
  • 5,970
  • 17
  • 20
4

Answer to the second Question: YES.

Changed Data is always Added to the tree with a higher revision number. same rev is never changed.

For Your Info:

The revision (1-abcdef) ist built that way: 1=Number of Version ( here: first version), second is a hash over the document-content (not sure, if there is some more "salt" in there)... so the same doc content will always produce the same revision number ( with the same setup of couchdb) even on other machines, when on the same changing-level ( 1-, 2-, 3-)

Another way is: if you need to keep old versions, you can store documents inside a bigger doc:

{
 id:"docHistoryContainer_5374",
 "doc_id":"5374",
 "versions":[
   {"v":1,
    "date":[2012,03,15],
    "doc":{ .... doc_content v1....}
   },
   {"v":2,
    "date":[2012,03,16],
    "doc":{ .... doc_content v2....}
   }
 ]
}

then you can ask for revisions:

View "byRev":

for (var curRev in doc.versions) {
  map([doc.doc_id,doc.versions[curRev].v],doc.versions[curRev]);
}

call:

/byRev?startkey=["5374"]&endkey=["5374",{}]

result:

{ id:"docHistoryContainer_5374",key=[5374,1]value={...doc_content v1 ....} } { id:"docHistoryContainer_5374",key=[5374,2]value={...doc_content v2 ....} }

Additionaly you now can write also a map-function that amits the date in the key, so you can ask for revisions in a date-range

okurow
  • 969
  • 10
  • 6
  • geez but that is totally huge! So as long as you're not using compact and as long as you query in a date range "lesser or equal" to the current date you're guaranteed that your queries are referentially transparent? (at least in the concept of a specific DB) That is an amazing feature I think! It has the potential to make it much, much easier to "recreate the state" (for example when tracing/debuggin). And simply, well, much easier to reason about the program overall. That definitely gets me **very** interested in CouchDB : ) +1 to both answers : ) – Cedric Martin Mar 16 '12 at 19:06
  • sorry... date querying is only in the second version ... you cannot write a map that looks for old version-content. You only can "ask" a specific doc for its revisions and then retrieve the content of this revision, but not querying – okurow Mar 16 '12 at 19:51
1

t0(t1...) is in couchdb called "revision". Each time you change a document, the revision-number increases. The docs old revisions are stored until you don't want to have old revisions anymore, and tell the database "compact". Look at "Accessing Previous Revisions" in http://wiki.apache.org/couchdb/HTTP_Document_API

okurow
  • 969
  • 10
  • 6
  • +1... that is *very* interesting. I edited my question a bit more: basically I'd want to know if queries can be made to be referentially transparent (when a specific revision is specified). – Cedric Martin Mar 16 '12 at 17:27