Splitting a row of dataframe into multiple rows in spark. The dataframe only has one column which contains an array of string values

Question

I have a csv file that I am reading into spark. The only column I am reading has an array of time values. I want each time value to be a different row. I have tried a couple of different things like explode but they don't seem to work for me.

val checkin_data=sqlContext.read
                            .format("com.databricks.spark.csv")
                            .option("header", "true")
                            .load("/home/saurabh/Projects/BigData/Datasets/YelpDataSet/yelp_academic_dataset_checkin.csv")
                            .select("time")

This is the result I get if I select the first row

checkin_data.first()

[[u'Fri-0:2', u'Sat-0:1', u'Sun-0:1', u'Wed-0:2', u'Sat-1:2', u'Thu-1:1', u'Wed-1:1', u'Sat-2:1', u'Sun-2:2', u'Thu-2:1', u'Wed-2:1', u'Fri-3:1', u'Sun-3:3', u'Thu-4:1', u'Tue-4:1', u'Sun-6:1', u'Wed-6:1', u'Fri-10:1', u'Sat-10:1', u'Mon-11:1', u'Wed-11:2', u'Mon-12:1', u'Sat-12:1', u'Tue-12:1', u'Sat-13:2', u'Thu-13:1', u'Tue-13:2', u'Wed-13:2', u'Fri-14:2', u'Sat-14:1', u'Wed-14:1', u'Fri-15:1', u'Sat-15:1', u'Thu-15:1', u'Tue-15:1', u'Fri-16:1', u'Sat-16:2', u'Sun-16:1', u'Tue-16:1', u'Sat-17:3', u'Sun-17:1', u'Fri-18:1', u'Mon-18:1', u'Sat-18:2', u'Sun-18:1', u'Tue-18:2', u'Wed-18:1', u'Fri-19:2', u'Mon-19:1', u'Sun-19:2', u'Thu-19:1', u'Wed-19:1', u'Mon-20:1', u'Sun-20:5', u'Thu-20:1', u'Tue-20:1', u'Wed-20:2', u'Fri-21:2', u'Sun-21:1', u'Thu-21:4', u'Tue-21:1', u'Wed-21:1', u'Fri-22:1', u'Thu-22:1', u'Fri-23:1', u'Mon-23:1', u'Sat-23:3', u'Sun-23:1', u'Thu-23:2', u'Tue-23:1']]

Is there a way I can convert each row into multiple rows like this?

Fri-0:2

Sat-0:1

Sun-0:1

Wed-0:2

Sat-1:2

Thu-1:1

I am new to spark so I am sorry if I could not explain this right. Any help is much appreciated.

score 0 · Answer 1 · edited May 23 '17 at 12:18

0

SparkSql's explode method should help you!

Here is a post that might help.

edited May 23 '17 at 12:18

Community

1
1

answered May 03 '17 at 08:21

code

2,283
2
19
27

Splitting a row of dataframe into multiple rows in spark. The dataframe only has one column which contains an array of string values

1 Answers1