3

My application uses an AbstractFactory for the DAO layer so once the HBase DAO family has been implemented, It would be very great for me to create the Cassandra DAO family and see the differences from several points of view.
Anyway, trying to do that, I saw Cassandra doesn't support cell versioning like HBase (and my application makes a strong usage of that) so I was wondering if there are some table design trick (or something else) to "emulate" this behaviour in Cassandra

Andrea
  • 2,714
  • 3
  • 27
  • 38

1 Answers1

3

One common strategy is to use composite column names with two components: the normal column name, and a version. What you use for the version component depends on your access patterns. If you might have updates coming from multiple clients simultaneously, then using a TimeUUID is your safest option. If only one client may update at a time, you can use something smaller, like a timestamp or version number.

Assuming you use version numbers for simplicity, here's what that might look like for storing documents with versioned fields:

| ('body', 5) | ('body', 4) | ... | ('title', 1) | ('title', 0) |
|-------------|-------------|-----|--------------|--------------|
| 'Neque ...' | 'Dolor ...' | ... | 'Lorem Ipsum'| 'My Document'|

This format is primarily useful if you want a specific version of a field, all versions of a field, or all versions of all fields.

If you also want to support efficiently fetching the latest version of all fields at once, I suggest you denormalize and add a second column family where only the latest version of each field is store in its normal form. You can blindly overwrite these fields for each change. Continuing our example, this column family would look like:

|   'body'    |    'title'    |
|-------------|---------------|
| 'Neque ...' | 'Lorem Ipsum' |
Tyler Hobbs
  • 6,872
  • 24
  • 31
  • Thanks for your clear response. And what about if my access pattern is something like "give me cells older than timestamp XXXX"? I think I need to parse each column in order to find what I need. That shouldn't be a big issue because my rows have (About) 50 columns but I'm wondering if in this case there is some other more appropriate approach – Andrea Sep 29 '12 at 20:04
  • 1
    If you want to access cells based on time, use a timestamp (or TimeUUID) as the first component of the column names; this will cause them to be sorted by time, making it efficient to fetch anything from a slice of time. – Tyler Hobbs Oct 08 '12 at 19:29