I'm changing an hdfs directory structure. The current one is as follows:
.../customers/customers1/2016-05-16-10/lots_of_files1.csv
.../customers/customers2/2016-05-16-10/lots_of_files2.csv
.../customers/customers3/2016-05-16-10/lots_of_files1.csv
.../customers/customers4/2016-05-16-10/...
.../customers/customers5/2016-05-16-10/...
.../customers/customers6/2016-05-16-10/...
.../customers/customers7/2016-05-16-10/...
I'd like to get rid of the customers(1-7):
.../customers/2016-05-16-10/lots_of_files1.csv
.../customers/2016-05-16-10/lots_of_files2.csv
.../customers/2016-05-16-10/lots_of_files1(1).csv
I thought to use snakebite python hdfs library but lots of edge-cases arise: 1. The same date may occur more than once. 2. The name of the csv may occure more than once, but it's data is different and must be moved as well.
How do you achieve it in the cleanest way possible?