3

Im trying to learn cassandra but im confused with the terminology.

Many instances it says the row stores key/value pairs.

but, when I define a table its more like declaring a SQL table ie; you create a table and specify the column names and data types.

Can someone clarify this?

user1050619
  • 19,822
  • 85
  • 237
  • 413
  • 2
    Does this answer your query? http://stackoverflow.com/questions/13010225/why-many-refer-to-cassandra-as-a-column-oriented-database – Adrian Lynch Feb 25 '16 at 02:12
  • still confusing..when I define a table, I specify the columns correct. Will Apples will be stored in different table and oranges will be stored in different table or A table "Fruit" will contain both oranges and apples? – user1050619 Feb 25 '16 at 02:22
  • Also: http://stackoverflow.com/questions/28501131/is-cassandra-a-key-value-store-or-wide-column-store/ – Aaron Feb 25 '16 at 14:03

2 Answers2

3

Cassandra is a column based NoSQL database. While yes at its lowest level it does store simple key-value pairs it stores these key-value pairs in collections. This grouping of keys and collections is analogous to rows and columns in a traditional relational model. Cassandra tables contain a schema and can be referenced (with restrictions) using a SQL-like language called CQL.

In your comment you ask about Apples being stored in a different table from oranges. The answer to that specific question is No it will be in the same table. However Cassandra tables have an additional concept call the Partition Key that doesn't really have an analgous concept in the relational world. Take for example the following table definition

CREATE TABLE fruit_types { fruit text, location text, cost float, PRIMARY KEY ((fruit), location) }

In this table definition you will notice that we are defining the schema for the table. You will also notice that we are defining a PRIMARY KEY. This primary key is similar but not exactly like a relational concept. In Cassandra the PRIMAY KEY is made up of two parts the PARTITION KEY and CLUSTERING COLUMNS. The PARTITION KEY is the first fields specified in the PRIMARY KEY and can contain one or more fields delimitated by parenthesis. The purpose of the PARTITION KEY is to be hashed and used to define the node that owns the data and is also used to physically divide the information on the disk into files. The CLUSTERING COLUMNS make up the other columns listed in the PRIMARY KEY and amongst other things are used for defining how the data is physically stored on the disk inside the different files as specified by the PARTITION KEY. I suggest you do some additional reading on the PRIMARY KEY here if your interested in more detail:

https://docs.datastax.com/en/cql/3.0/cql/ddl/ddl_compound_keys_c.html

bechbd
  • 6,206
  • 3
  • 28
  • 47
0

Basically cassandra storage is like sparse matrix, earlier version has a command line tool called cqlsh which can show the exact storage foot print of your columnfamily(aka table in latest version). Later community decided to keep RDBMS kind of syntax for better understanding coz the query language(CQL) syntax is similar to sql.

main storage is key(partition) (which is hash function result of chosen partition column in your table and rest of the columns will be tagged to it like sparse matrix.

Gomes
  • 3,330
  • 25
  • 17