Cassandra File System

Question

According to brisk implementation [ Presentation in Cassandra SF ] - Cassandra, CFS, Job/Task Tracker and Hive Metastore run in a single JVM which is totally different from Configuring an independent hadoop cluster.

Is this an advantage?

What happens if Task Tracker or any of the individual process in the JVM fails? Will that affect the cassandra instance in the same JVM?

How does CFS get data from? Is it storing the SSTables as sub blocks or a copy of it? Where is that compression of sub blocks done?

Regards, Tamil

score 3 · Accepted Answer · answered Nov 03 '11 at 23:54

Brisk does run all of it in a single JVM, but in separate independent threads that don't effect one another. The trackers run on a dedicated node, but there is no single-point-of-failure. Any node can be elected to run the trackers and all of the state is persisted to the Cassandra cluster.

The advantage to it all being in the same JVM is that there's no copy and serialization overhead for moving data from Cassandra into the Hadoop code.

CassandraFS breaks the 64MB HDFS blocks into 2MB chunks and stores them as columns in Cassandra, with one row per block. The files themselves are mapped to a list of block row UUIDs in the inodes column family.

Cassandra File System

1 Answers1