0

I have a csv file that I am reading into spark. The only column I am reading has an array of time values. I want each time value to be a different row. I have tried a couple of different things like explode but they don't seem to work for me.

val checkin_data=sqlContext.read
                            .format("com.databricks.spark.csv")
                            .option("header", "true")
                            .load("/home/saurabh/Projects/BigData/Datasets/YelpDataSet/yelp_academic_dataset_checkin.csv")
                            .select("time")

This is the result I get if I select the first row

checkin_data.first()

[[u'Fri-0:2', u'Sat-0:1', u'Sun-0:1', u'Wed-0:2', u'Sat-1:2', u'Thu-1:1', u'Wed-1:1', u'Sat-2:1', u'Sun-2:2', u'Thu-2:1', u'Wed-2:1', u'Fri-3:1', u'Sun-3:3', u'Thu-4:1', u'Tue-4:1', u'Sun-6:1', u'Wed-6:1', u'Fri-10:1', u'Sat-10:1', u'Mon-11:1', u'Wed-11:2', u'Mon-12:1', u'Sat-12:1', u'Tue-12:1', u'Sat-13:2', u'Thu-13:1', u'Tue-13:2', u'Wed-13:2', u'Fri-14:2', u'Sat-14:1', u'Wed-14:1', u'Fri-15:1', u'Sat-15:1', u'Thu-15:1', u'Tue-15:1', u'Fri-16:1', u'Sat-16:2', u'Sun-16:1', u'Tue-16:1', u'Sat-17:3', u'Sun-17:1', u'Fri-18:1', u'Mon-18:1', u'Sat-18:2', u'Sun-18:1', u'Tue-18:2', u'Wed-18:1', u'Fri-19:2', u'Mon-19:1', u'Sun-19:2', u'Thu-19:1', u'Wed-19:1', u'Mon-20:1', u'Sun-20:5', u'Thu-20:1', u'Tue-20:1', u'Wed-20:2', u'Fri-21:2', u'Sun-21:1', u'Thu-21:4', u'Tue-21:1', u'Wed-21:1', u'Fri-22:1', u'Thu-22:1', u'Fri-23:1', u'Mon-23:1', u'Sat-23:3', u'Sun-23:1', u'Thu-23:2', u'Tue-23:1']]

Is there a way I can convert each row into multiple rows like this?

Fri-0:2

Sat-0:1

Sun-0:1

Wed-0:2

Sat-1:2

Thu-1:1

I am new to spark so I am sorry if I could not explain this right. Any help is much appreciated.

1 Answers1

0

SparkSql's explode method should help you!

Here is a post that might help.

Community
  • 1
  • 1
code
  • 2,283
  • 2
  • 19
  • 27