I have a csv file that I am reading into spark. The only column I am reading has an array of time values. I want each time value to be a different row. I have tried a couple of different things like explode but they don't seem to work for me.
val checkin_data=sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.load("/home/saurabh/Projects/BigData/Datasets/YelpDataSet/yelp_academic_dataset_checkin.csv")
.select("time")
This is the result I get if I select the first row
checkin_data.first()
[[u'Fri-0:2', u'Sat-0:1', u'Sun-0:1', u'Wed-0:2', u'Sat-1:2', u'Thu-1:1', u'Wed-1:1', u'Sat-2:1', u'Sun-2:2', u'Thu-2:1', u'Wed-2:1', u'Fri-3:1', u'Sun-3:3', u'Thu-4:1', u'Tue-4:1', u'Sun-6:1', u'Wed-6:1', u'Fri-10:1', u'Sat-10:1', u'Mon-11:1', u'Wed-11:2', u'Mon-12:1', u'Sat-12:1', u'Tue-12:1', u'Sat-13:2', u'Thu-13:1', u'Tue-13:2', u'Wed-13:2', u'Fri-14:2', u'Sat-14:1', u'Wed-14:1', u'Fri-15:1', u'Sat-15:1', u'Thu-15:1', u'Tue-15:1', u'Fri-16:1', u'Sat-16:2', u'Sun-16:1', u'Tue-16:1', u'Sat-17:3', u'Sun-17:1', u'Fri-18:1', u'Mon-18:1', u'Sat-18:2', u'Sun-18:1', u'Tue-18:2', u'Wed-18:1', u'Fri-19:2', u'Mon-19:1', u'Sun-19:2', u'Thu-19:1', u'Wed-19:1', u'Mon-20:1', u'Sun-20:5', u'Thu-20:1', u'Tue-20:1', u'Wed-20:2', u'Fri-21:2', u'Sun-21:1', u'Thu-21:4', u'Tue-21:1', u'Wed-21:1', u'Fri-22:1', u'Thu-22:1', u'Fri-23:1', u'Mon-23:1', u'Sat-23:3', u'Sun-23:1', u'Thu-23:2', u'Tue-23:1']]
Is there a way I can convert each row into multiple rows like this?
Fri-0:2
Sat-0:1
Sun-0:1
Wed-0:2
Sat-1:2
Thu-1:1
I am new to spark so I am sorry if I could not explain this right. Any help is much appreciated.