0

My partition is based on year/month/date. Using SimpleDateFormat for week year created a wrong partition . The data for the date 2017-31-12 was moved to 2018-31-12 using YYYY in the date format.

   SimpleDateFormat sdf = new SimpleDateFormat("YYYY-MM-dd");

So what I want is to move my data from partition 2018/12/31 to 2017/12/31 of the same table. I did not find any relevant documentation to do the same.

Aditya Goel
  • 13
  • 2
  • 6

2 Answers2

0

There is a JIRA related to that https://issues.apache.org/jira/browse/SPARK-19187. Upgrade your spark version to 2.0.1 should fix the problem

hlagos
  • 7,690
  • 3
  • 23
  • 41
0

From what I understood, you would like to move the data from 2018-12-31 partition to 2017/12/31. Below is my explanation of how you can do it.

#From Hive/Beeline
ALTER TABLE TableName PARTITION (PartitionCol=2018-12-31) RENAME TO PARTITION (PartitionCol=2017-12-31);

FromSparkCode, You basically have to initiate the hiveContext and run the same HQL from it. You can refer one my answer here on how to initiate the hive Context.

#If you want to do on HDFS level, below is one of the approaches
#FromHive/beeline run the below HQL
ALTER TABLE TableName ADD IF NOT EXISTS PARTITION (PartitionCol=2017-12-31);

#Now from HDFS Just move the data in 2018 to 2017 partition
hdfs dfs -mv /your/table_hdfs/path/schema.db/tableName/PartitionCol=2018-12-31/* /your/table_hdfs/path/schema.db/tableName/PartitionCol=2017-12-31/

#removing the 2018 partition if you require
hdfs dfs -rm -r /your/table_hdfs/path/schema.db/tableName/PartitionCol=2018-12-31

#You can also drop from beeline/hive
alter table tableName drop if exists partition (PartitionCol=2018-12-31);

#At the end repair the table
msck repair table tableName

Why do i have to repair the table ??

roh
  • 1,033
  • 1
  • 11
  • 19