2

i have following table

 create table stocks
(exchange string,symbol string,date string,open float)
partitioned by (exch string,sym string)
clustered by (date) into 5 buckets
row format delimited fields terminated by ',';

my question is:- how the data is stored in HDFS? would it be 5 buckets(sub directories) inside both the partitions(total 10 buckets) or will it be 5 sub directories inside the partition?

I tried creating this program in Hive, but was not success-full.

K S Nidhin
  • 2,622
  • 2
  • 22
  • 44
anjali
  • 31
  • 1
  • 2
  • Welcome to SO. I wonder where the table is. It would be great if you can provide the table.Then, someone will be able to help you. – jazzurro Sep 29 '14 at 02:36

2 Answers2

2

Hi the create statement should looks like below, as DATE is a reserved keyword in Hive I believe.

CREATE TABLE stocks(exchange STRING, symbol STRING, day STRING, open FLOAT)
PARTITIONED BY(exch STRING, sym STRING)
CLUSTERED BY(day) INTO 5 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

In HDFS the directory structure will be:

/user/hive/warehouse/<DB_NAME>/stocks/day1/bucket1

so 5 such directories will be there.

You can refer this link if something precisely you are looking for What is the difference between partitioning and bucketing a table in Hive?

Thanks.

Serg
  • 2,346
  • 3
  • 29
  • 38
scalauser
  • 1,327
  • 1
  • 12
  • 34
0

The data stored in HDFS would have 5 directories since 5 buckets. The structure would be like :

<hdfs_path>/date1/exch1/sym1
<hdfs_path>/date2/exch2/sym2
<hdfs_path>/date3/exch3/sym3
<hdfs_path>/date4/exch4/sym4
<hdfs_path>/date5/exch5/sym5

refer this for more details.

Community
  • 1
  • 1
K S Nidhin
  • 2,622
  • 2
  • 22
  • 44