2

Folks,

Recently I was reading some of the blogs NOSQL column oriented storage. I am trying my hands on CASSANDRA and HBASE.

What I understood is data is stored in column oriented manner.

e.g. Employee Id , Employee Name, Last Name

 100 , 'abc', 'xyz'
 200 , 'ABC' , 'XYZ'

Then data will be stored in the following format on the disk (column oriented storage single column together)

First column   Second column       Third Column
100|200       ,  'abc'|'ABC'   ,    'xyz'|'XYZ'

1 ) I was wondering if we have to retrive single raw with id = 100 how it is done ? Since data is not continuous it will be costly ? (Is there any index with raw key for all columns)

2 ) Why HBASE cassandra is not having proper aggregation function support as Column oriented storage is meant for that ?

Community
  • 1
  • 1
user1927808
  • 577
  • 1
  • 10
  • 23

1 Answers1

2

simple answer - HBase and Cassandra aren’t column oriented, they are row oriented. The difference to traditional databases however is, that each row is actually a key/value pair of the PK and an arbitarry number of columns.

Column oriented databases are for instance vertica and terra data.

You are however right that retrieving full row from a column oriented storage is more costy than from a row oriented DB. But column oriented DBMS were inveted for analysis, where you usually want to aggregate few columns over all the data, while row oriented is meant for retrieving (almost) full rows from only a small subset of data.

peter
  • 14,348
  • 9
  • 62
  • 96
  • Quite confusing because if you google it seems there is a concept of column family for HBase and Cassandra. Can you tell me why column family is required and what is the internal storage mechanism for Hbase and cassandra. – user1927808 Jun 12 '14 at 06:28
  • One more link which describe it as a column oriented. http://www.edureka.in/blog/apache-cassandra-advantages/ – user1927808 Jun 12 '14 at 07:42
  • HBASE is also column oriented ? http://stackoverflow.com/questions/321280/recommendations-for-column-oriented-database – user1927808 Jun 12 '14 at 09:08
  • 1
    http://stackoverflow.com/questions/11816609/column-based-or-row-based-for-hbase - according to this it isn’t column oriented in the same form as terra data or vertica - and the columnfamily in cassandra is explained here - http://www.datastax.com/docs/0.8/ddl/column_family it is basically a set of rows, like a table – peter Jun 12 '14 at 09:28
  • maybe the actual problem is the understanding of how a columnoriented DB works - according to this wikipedia article http://en.wikipedia.org/wiki/List_of_column-oriented_DBMSes even SQL Server 2012 is a column oriented system. But the professor holding a lecture on column oriented databases made it clear to us, that those aren’t really the same as terra data and vertica, which deal with all the problems of optimizing updates and compressing data, which AFAIK neither cassandra nor HBase implement in that way – peter Jun 12 '14 at 09:32