I'm have a CSV dataset that I want to process using Spark, the second column is of this format:
yyyy-MM-dd hh:mm:ss
I want to group each MM-dd
val days : RDD = sc.textFile(<csv file>)
val partitioned = days.map(row => {
row.split(",")(1).substring(5,10)
}).invertTheMap.groupOrReduceByKey
The result of groupOrReduceByKey
is of form:
("MM-dd" -> (row1, row2, row3, ..., row_n) )
How should I implement invertTheMap
and groupOrReduceByKey
?
I saw this in Python here but I wonder how is it done in Scala?