I am very much new to scala and I have a csv file :
MSH ModZId ModProd Date
1140000 zzz abc 2/19/2018
1140000 zzz abc 2/19/2018
651 zzz abc 2/19/2018
651 zzz abc 2/19/2018
1140000 zzz abc 2/19/2018
860000 zzz mno 2/26/2018
860000 zzz mno 2/26/2018
122 zzz mno 2/26/2018
122 zzz mno 2/26/2018
860000 zzz mno 2/26/2018
1140000 zzz pxy 2/19/2018
1140000 zzz pxy 2/19/2018
I need to partition the csv file on the basis of date and convert the partition on to the parquet like below:
Folder name 2018/02/19
and parquet file1 output
MSH ModZId ModProd Date
1140000 zzz abc 2/19/2018
1140000 zzz xyz 2/19/2018
651 zzz def 2/19/2018
651 zzz ghi 2/19/2018
1140000 zzz klm 2/19/2018
parquet file2 Output
MSH ModZId ModProd Date
1140000 zzz pxy 2/19/2018
1140000 zzz pxy 2/19/2018
Folder Name 20180226
MSH ModZId ModProd Date
860000 zzz mno 2/26/2018
860000 zzz pqr 2/26/2018
122 zzz stu 2/26/2018
122 zzz wxy 2/26/2018
860000 zzz ijk 2/26/2018
I am trying this and not sure how to iterate over the dataframe
val writeDF = df
.select ($"ModProd ",$"Date").distinct().orderBy($"ModProd ",$"Date")
writeDF.show()
df
.write
.mode(SaveMode.Overwrite)
.format("parquet")
.partitionBy("Date")
.save(Path)
}
Can anyone please help me .I am very much new and do not know how can i partition the csv file in scala on the basis of date