I have Hive table partitioned based on date yyyy-mm-dd. I want to run a script everyday that can delete all the partitions older than 30 days but in this case I don't want to pass the date. I want it to take system date and delete the partitions older than 30 days.
Asked
Active
Viewed 5,029 times
1
-
Is your hive table managed or external table? – Ramesh Nov 19 '17 at 13:36
-
What do you use to run your hive scripts? UNIX, ADF etc? – Ramesh Nov 19 '17 at 13:37
-
It's managed orc table and I have written scripts in hql. I can use linux or unix scripts that is not a problem. Can you help me out – Priyanka Nov 19 '17 at 14:29
-
Please share what attempts you've done already – OneCricketeer Nov 19 '17 at 16:39
-
Tried one script but it deletes all the partition older than current date. I tried to give pass the value 30 but it throws error -> hive --hivevar today=$(date +"%Y-%m-%d") -e \ > 'alter table local_db.mytable drop partition (day
– Priyanka Nov 19 '17 at 16:59 -
Well, yeah. You're not passing the date of 30 days ago – OneCricketeer Nov 19 '17 at 18:03
-
You just missed the 30 days ago. Please see the answer below. – Ramesh Nov 19 '17 at 18:04
1 Answers
6
You need to use Linux/ Unix to set the variable for the DROP PARTITION date and use it in the ALTER TABLE statement.
Here is an example
CREATE TABLE ramesh.test
(col1 STRING,
col2 STRING)
PARTITIONED BY (partition_date date);
INSERT INTO TABLE ramesh.test PARTITION (partition_date='2017-10-01') VALUES ('key1', 'val1');
INSERT INTO TABLE ramesh.test PARTITION (partition_date='2017-10-02') VALUES ('key2', 'val2');
INSERT INTO TABLE ramesh.test PARTITION (partition_date='2017-11-01') VALUES ('key3', 'val3');
hive --hivevar var_drop_date="$(date -d "30 days ago" +"%Y-%m-%d")" -e 'ALTER TABLE ramesh.test DROP IF EXISTS PARTITION (partition_date <= date "${hivevar:var_drop_date}")'
This drops the 2 partitions from 30 days ago and leaves the one within 30 days.
Let me know whether it works for you. I used CDH for testing this.

OneCricketeer
- 179,855
- 19
- 132
- 245

Ramesh
- 1,405
- 10
- 19
-
Thanks Ramesh. I also tried on CDH, it solved my problem. Can you please explain (date -d "30 days ago" +"%Y-%m-%d") and I am confused when to use --hivevar, --hiveconf and --hiveenv – Priyanka Nov 19 '17 at 18:33
-
date command with "30 days ago" gives you the sys date - 30 days. Please see this for hivevar/ conf - https://stackoverflow.com/questions/12464636/how-to-set-variables-in-hive-scripts – Ramesh Nov 19 '17 at 20:02
-
-
@Priyanka - you are welcome. You may want to mark it as the answer here as the solution has worked. – Ramesh Nov 20 '17 at 10:07