1

I have Hive table partitioned based on date yyyy-mm-dd. I want to run a script everyday that can delete all the partitions older than 30 days but in this case I don't want to pass the date. I want it to take system date and delete the partitions older than 30 days.

Priyanka
  • 25
  • 1
  • 10

1 Answers1

6

You need to use Linux/ Unix to set the variable for the DROP PARTITION date and use it in the ALTER TABLE statement.

Here is an example

CREATE TABLE ramesh.test
(col1 STRING, 
 col2 STRING)
PARTITIONED BY (partition_date date);

INSERT INTO TABLE ramesh.test PARTITION (partition_date='2017-10-01') VALUES ('key1', 'val1');
INSERT INTO TABLE ramesh.test PARTITION (partition_date='2017-10-02') VALUES ('key2', 'val2');
INSERT INTO TABLE ramesh.test PARTITION (partition_date='2017-11-01') VALUES ('key3', 'val3');

hive --hivevar var_drop_date="$(date -d "30 days ago" +"%Y-%m-%d")" -e 'ALTER TABLE ramesh.test DROP IF EXISTS PARTITION (partition_date <= date "${hivevar:var_drop_date}")'

This drops the 2 partitions from 30 days ago and leaves the one within 30 days.

Let me know whether it works for you. I used CDH for testing this.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Ramesh
  • 1,405
  • 10
  • 19
  • Thanks Ramesh. I also tried on CDH, it solved my problem. Can you please explain (date -d "30 days ago" +"%Y-%m-%d") and I am confused when to use --hivevar, --hiveconf and --hiveenv – Priyanka Nov 19 '17 at 18:33
  • date command with "30 days ago" gives you the sys date - 30 days. Please see this for hivevar/ conf - https://stackoverflow.com/questions/12464636/how-to-set-variables-in-hive-scripts – Ramesh Nov 19 '17 at 20:02
  • Got it. Thanks Ramesh – Priyanka Nov 20 '17 at 05:42
  • @Priyanka - you are welcome. You may want to mark it as the answer here as the solution has worked. – Ramesh Nov 20 '17 at 10:07