Apache Hive Partition & Bucketing Structure

Question

In Apache Hive, how does the directory structure looks after a huge dataset is being partitioned and then bucketed?

For Ex - I have a customer dataset for a country, the data is being partitioned by state and then bucketed by city. How do we get to know how many files will be present in a city bucket?

score 1 · Answer 1 · answered Jan 26 '20 at 08:08

A partition is a directory, and each partition corresponds to a specific value of the partitioned column.

Within a table or a partition/directory, buckets are organized as files. The number of buckets is predefined when creating a table with CLUSTERED BY (sth) INTO K BUCKETS. There will be ONE file for each individual bucket. Hive assigns records to buckets based on their hash value calculated by the bucketed column, and a mod is taken by the num of buckets K.

score 0 · Answer 2 · answered Jan 26 '20 at 14:59

0

Maximum number of bucketing is 256 . For more details kindly refer below link:

[What is the difference between partitioning and bucketing a table in Hive ?

answered Jan 26 '20 at 14:59

saravanatn

630
5
9

Apache Hive Partition & Bucketing Structure

2 Answers2