-1

I need to understand in detail how to design efficient data structures in Cassandra. Is there an online demo or tutorial for understanding the data structure of Cassandra? I need to be able to design column families with their columns and payloads, and see some specific, tangible examples. I'd appreciate it if anyone could recommend a source that would allow me to do this.

TomFH
  • 55
  • 3

1 Answers1

0

In the several thousands of classes that make up the Cassandra codebase, I doubt C*'s performance can be attributed to a single data structure. This topic is a bit complicated for a single online demo, however...

What better source than the source... Start looking through code and checkout what data structures are used. Data in memory is stored in something called a memtable which is a sorted string table (sstable). The in-memory data is then flushed to disk and again stored in sstables. This SO question does a comparrison between binary tries and sstables for indexing columns in the dB.

The other data structure I found to be interesting is the merkle tree, used during repairs. This is a hashed binary tree. There are many advantages and disadvantages when using the merkle tree, but the main advantage (and i guess disadvantage) is that it reduces how much data needs to be transferred across the wire for repairs (aka tree synchronization) at the expense of local io required for computing the tree's hashes. Read more details in this SO answer and read about merkle trees on wikipedia. There is also a great description of how the merkle trees are used during repair in sections 4.6 and 4.7 in the dynamo paper.

Community
  • 1
  • 1
Lyuben Todorov
  • 13,987
  • 5
  • 50
  • 69
  • Thank you. However, I am looking for guidance on the representation of business data in a column family, containing row id, column family, column, payload, etc. Must the columns be one next to the other (horizontally), or can they be one after the other (vertically)? Tx – TomFH Aug 01 '13 at 05:29