I have a table in hive whose DDL looks like below:
CREATE TABLE test(id string) PARTITIONED BY (data_updated string,batch_id string,country string,state string);
data_updated=NO/batch_id=102/country='1'/state='A';
data_updated=NO/batch_id=103/country='2'/state='B';
data_updated=NO/batch_id=104/country='3'/state='C';
I want to move the data from data_updated=NO to data_updated=YES. And I do that via HDFS commands.
data_updated=YES/batch_id=102/country='1'/state='A'
data_updated=YES/batch_id=103/country='2'/state='B'
data_updated=YES/batch_id=104/country='3'/state='C'
One option is to write the multiple rename partition DDLs. But, alter partition DDLs require all partitions to be specified upto Nth level. N-level is dynamic and cannot be statically determined.
I am getting the batch_ids with the help of unix script and moving the directory manually using move command in HDFS.
Inside a loop:
hadoop fs -mv /table/directory/data_processed=NO/batch_id=$i /table/directory/data_processed=YES/
The problem with moving with HDFS commands is after repairing the table in hive, the non existing partitions still exists in hive metadata. How to remove non-existing partitions from hive metadata. Any suggestions?
Output of show partition in hive.
show partitions test;
data_updated=NO/batch_id=102/country='1'/state='A'; # NON-EXISTING
data_updated=NO/batch_id=103/country='2'/state='B'; # NON-EXISTING
data_updated=NO/batch_id=104/country='3'/state='C'; # NON-EXISTING
data_updated=YES/batch_id=102/country='1'/state='A'
data_updated=YES/batch_id=103/country='2'/state='B'
data_updated=YES/batch_id=104/country='3'/state='C'