0

I have the following dataframe (df2)

+----------------+---------+-----+------+-----+
|         Colours| Model   |year |type  |count|
+----------------+---------+-----+------+-----|
|red,green,white |Mitsubishi|2006|sedan |3    |
|gray,silver     |Mazda    |2010 |SUV   |2    |
+----------------+---------+-----+------+-----+

I need to explode the column "Colours", so it looks an expanded column like this:

+----------------+---------+-----+------+
|         Colours| Model   |year |type  |
+----------------+---------+-----+------+
|red             |Mitsubishi|2006|sedan |
|green           |Mitsubishi|2006|sedan |
|white           |Mitsubishi|2006|sedan |
|gray            |Mazda    |2010 |SUV   |
|silver          |Mazda    |2010 |SUV   |
+----------------+---------+-----+------+

I have created an array

val colrs=df2.select("Colours").collect.map(_.getString(0))

and added the array to dataframe

val cars=df2.withColumn("c",explode($"colrs")).select("Colours","Model","year","type")

but it didn't work, any help please.

zero323
  • 322,348
  • 103
  • 959
  • 935
Mohd Zoubi
  • 186
  • 3
  • 16

1 Answers1

3

You can use split and explode functions as below in your dataframe (df2)

import org.apache.spark.sql.functions._
val cars = df2.withColumn("Colours", explode(split(col("Colours"), ","))).select("Colours","Model","year","type")

You will have output as

cars.show(false)

+-------+----------+----+-----+
|Colours|Model     |year|type |
+-------+----------+----+-----+
|red    |Mitsubishi|2006|sedan|
|green  |Mitsubishi|2006|sedan|
|white  |Mitsubishi|2006|sedan|
|gray   |Mazda     |2010|SUV  |
|silver |Mazda     |2010|SUV  |
+-------+----------+----+-----+
Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97