3

My hive table is partitioned by date in period of 2 years , and each partition has 200 2mb files in it.

i am able to concatenate running following command "ALTER TABLE table_name partition (partition_column_name ='2017-12-31') concatenate"

Manually it takes more time to run each query, so is there any easy way to do this?

vikrant rana
  • 4,509
  • 6
  • 32
  • 72
manohar g
  • 53
  • 4

1 Answers1

3

Option-1: Select and overwrite same hive table:

Hive supports insert overwrite same table, if you are sure the data inserted in hive table using insert statements only (not loading files through hdfs) then use this option.

hive> SET hive.exec.dynamic.partition = true;
hive> SET hive.exec.dynamic.partition.mode = nonstrict;
hive> Insert overwrite table <partition_table_name> partition(<partition_col>) 
      select * from <db>.<partition_table_name>;

You can also use sort by,distribute by and these additional params to control the number of files created in the table.

Option-2:Using Shell script:

bash$ cat cnct.hql
alter table default.partitn1 partition(${hiveconf:var1} = '${hiveconf:var2}') concatenate

Trigger the above .hql script using shell script(for loop)

bash$ cat trigg.sh
#!/bin/bash
id=`hive -e "show partitions default.partitn"`
echo "partitions: " $id
for f in $id; do
echo "select query for: " $f
#split the partitions on = then assigning to two variables
IFS="=" read var1 var2 <<< $f
#pass the variables and execute the cnct.hql script
hive --hiveconf var1=$var1 --hiveconf var2=$var2 -f cnct.hql
done
notNull
  • 30,258
  • 4
  • 35
  • 50