I am not clear with the difference between partitioning and bucketing in hive and would really appreciate if you can provide some details with example.
Asked
Active
Viewed 7,622 times
0
-
1Check this out http://stackoverflow.com/questions/19128940/what-is-the-difference-between-partitioning-and-bucketing-a-table-in-hive/19131221#19131221 – Navneet Kumar Oct 06 '13 at 16:20
1 Answers
8
Here is a nice difference between Buckets and Partitioning.
Basically both Partitioning and Bucketing slice the data for executing the query much more efficiently than on the non-sliced data. The major difference is that the number of slices will keep on changing in the case of partitioning as data is modified, but with bucketing the number of slices are fixed which are specified while creating the table.
Bucketing happen by using a Hash algorithm and then a modulo on the number of buckets. So, a row might get inserted into any of the bucket. Bucketing can be used for sampling of data, as well also for joining two data sets much more effectively and much more.

Praveen Sripati
- 32,799
- 16
- 80
- 117