1

I can't figure out if in the implementation of Apache Cassandra the notion of partition and family column is the same!? It seems that Cassandra is no longer of column family databases but more likely a tabular partitioned database. Can some please explain. I'm following this paper work

Community
  • 1
  • 1
Hassam Abdelillah
  • 2,246
  • 3
  • 16
  • 37
  • Please tak a look at http://stackoverflow.com/questions/18824390/whats-the-difference-between-creating-a-table-and-creating-a-columnfamily-in-cas and http://stackoverflow.com/questions/36210321/comparing-cassandra-structure-with-relational-databases/36210877#36210877 – mmatloka Mar 25 '16 at 10:24

2 Answers2

2

It is the same thing, just in the different point of view.

Table is the two dimension view of column family. Basically, cassandra keep the data in the row format, like this:

RowKey: Alaska
(name=David:Fronta, value=, timestamp=11223344...)
(name=John:Cannon, value=, timestamp=123455...)

Above is the example how Cassandra are actually collect the data, in the table view it will be like this instead:

country | first_name | last_name
Alaska  | David      | Fronta
Alaska  | John       | Cannon

The RowKey in column family is the Partition Key in table, and if you have clustering columns, it will keep as col1:col2:...

Cassandra still keep the data in the row format, each row (in row format) is the partition (in table view).


So "What is the main difference between partition and column family in Cassandra?"

The answer is "just how it's called and how it's displayed".

madooc
  • 89
  • 5
1

No.

A columnfamily, now called Table (since CQL took over thrift), is a table which is going to be saved on all nodes in your Cassandra cluster.

How the data of a table is broken down on nodes is the work of the partitioner, so the partitioning mechanism has nothing to do with the concept of a table since from the outside you are not supposed to know whether your data is saved on node 1 or node 2 or node 3...

Finally, the partitioner is defined for a cluster as a whole. This, in part, defines things such as whether your rows will be sorted (which is not a good idea because then the number of rows saved on a given node will not be well balanced.)

For additional information, you may want to search for the word "partition" on this page:

http://wiki.apache.org/cassandra/Operations

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
  • Alexis should i understand here that Cassandra is no longer a "Column family database". In the doc it is stated that Cassandra is row partitioned database store – Hassam Abdelillah Mar 26 '16 at 21:54
  • The basic implementation concepts remain the same, yes, it is still very much like a data store based on the big tables and thus the "column family" concept. It is just that they renamed it table. The rows are what gets partitioned between nodes, so the concept of tables and partitions (rows) are separate. Although you have a limit where the ordering of rows is defined by the partitioner and that's a cluster wide setting. – Alexis Wilke Mar 26 '16 at 23:38