Hbase in comparison with Hive

Question

Im trying to get a clear understanding on HBASE.

Hive:- It just create a Tabular Structure for the Underlying Files in HDFS. So that we can enable the user to have Querying Abilities on the HDFS file. Correct me if im wrong here?

Hbase- Again, we have create a Similar table Structure, But bit more in Structured way( Column Oriented) again over HDFS File system.

aren't they both Same considering the type of job they does. except that Hive runs on Mapredeuce.

Also is that true that we cant create a Hbase table over an Already existing HDFS file?

possible duplicate of [How does Hive compare to HBase?](http://stackoverflow.com/questions/24179/how-does-hive-compare-to-hbase) — david25272, Feb 18 '14 at 23:45

score 0 · Answer 1 · answered Feb 19 '14 at 06:55

Hive shares a very similar structures to traditional RDBMS (But Not all), HQL syntax is almost similar to SQL which is good for Database Programmer from learning perspective where as HBase is completely diffrent in the sense that it can be queried only on the basis of its Row Key.

If you want to design a table in RDBMS, you will be following a structured approach in defining columns concentrating more on attributes, while in Hbase the complete design is concentrated around the data, So depending on the type of query to be used we can design a table in Hbase also the columns will be dynamic and will be changing at Runtime (core feature of NoSQL)

score 0 · Answer 2 · answered Feb 19 '14 at 08:53

You said aren't they both Same considering the type of job they does. except that Hive runs on Mapredeuce .This is not a simple thinking.Because when a hive query is executed, a mapreduce job will be created and triggered.Depending upon data size and complexity it may consume time, since for each mapreduce job, there are some number of steps to do by JobTracker, initializing tasks like maps,combine,shufflesort, reduce etc.

But in case we access HBase, it directly lookup the data they indexed based on specified Scan or Get parameters. Means it just act as a database.

Evgeny Benediktov · Answer 3 · 2014-02-19T13:46:54.033

Hive:

It's just create a Tabular Structure for the Underlying Files in HDFS. So that we can enable the user to have SQL-like Querying Abilities on existing HDFS files - with typical latency up to minutes. However, for best performance it's recommended to ETL data into Hive's ORC format.

HBase:

Unlike Hive, HBase is NOT about running SQL queries over existing data in HDFS.

HBase is a strictly-consistent, distributed, low-latency KEY-VALUE STORE.

From The HBase Definitive Guide:

The canonical use case of Bigtable and HBase is the webtable, that is, the web pages stored while crawling the Internet. The row key is the reversed URL of the page—for example, org.hbase.www. There is a column family storing the actual HTML code, the contents family, as well as others like anchor, which is used to store outgoing links, another one to store inbound links, and yet another for metadata like language. Using multiple versions for the contents family allows you to store a few older copies of the HTML, and is helpful when you want to analyze how often a page changes, for example. The timestamps used are the actual times when they were fetched from the crawled website.

The fact that HBase uses HDFS is just an implementation detail: it allows to run HBase on an existing Hadoop cluster, it guarantees redundant storage of data; but it is not a feature in any other sense.

Also is that true that we cant create a Hbase table over an already existing HDFS file?

No, it's NOT true. Internally HBase stores data in its HFile format.

score 0 · Answer 4 · answered Feb 19 '14 at 10:40

Hive and HBase are completely different things

Hive is a way to create map/reduce jobs for data that resides on HDFS (can be files or HBase) HBase is an OLTP oriented key-value store that resides on HDFS and can be used in Map/Reduce jobs

In order for Hive to work it holds metadata that maps the HDFS data into tabular data (since SQL works on tables).

I guess it is also important to note that in recent versions Hive is evolving to go beyond a SQL way to write map/reduce jobs and with what HortonWorks calls the "stinger initiative" they have added a dedicated file format (Orc) and import Hive's performance (e.g. with the upcoming Tez execution engine) to deliver SQL on Hadoop (i.e. relatively fast way to run analytics queries for data stored on Hadoop)

Hbase in comparison with Hive

4 Answers4