-1

I have a table as shown:

Table 1


I want to transform it into the following table using Spark Java or Spark Scala

Transformed Table 1

Varun Gupta
  • 33
  • 1
  • 3

3 Answers3

4

make sure you have unique column names, denn you can do :

import or.apache.spark.sql.functions._

table
  .select("id","movie",explode(array("cast1", "cast2", "cast3", "cast4")).as("cast"))
  .where(col("cast").isNotNull)
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
1

With "union":

val table = List(
  (101, "ABC", "A", "B", "C", "D"),
  (102, "XZY", "G", "J", null, null))
  .toDF("ID", "Movie", "Cast1", "Cast2", "Cast3", "Cast4")

val columnsToUnion = List("Cast1", "Cast2", "Cast3", "Cast4")
val result = columnsToUnion.map(name => table.select($"ID", $"Movie", col(name).alias("Cast")).where(col(name).isNotNull))
  .reduce(_ union _)
result.show(false)

Output:

+---+-----+----+
|ID |Movie|Cast|
+---+-----+----+
|101|ABC  |A   |
|102|XZY  |G   |
|101|ABC  |B   |
|102|XZY  |J   |
|101|ABC  |C   |
|101|ABC  |D   |
+---+-----+----+

NOTE: Table cannot has several columns with the same name, assuming column names have such pattern: "Cast[i]"

pasha701
  • 6,831
  • 1
  • 15
  • 22
0
table.groupBy("ID", "Movie")
  .agg(collect_list("Cast1", "Cast2", "Cast3", "Cast2").as("cast"))
  .withColumn("cast", explode("cast"))

// a side note: you should always avoid duplicate column name in the same DataFrame

Fermat's Little Student
  • 5,549
  • 7
  • 49
  • 70