0

I have this DataFrame:

id name q1_w q1_x q1_y q1_z q2_w q2_x q2_y q2_z
1 AAAA val1 val2 val3 val4 valw valx valy valz
2 BBBB del1 del2 del3 del4 delw delx dely delz
3 CCCC sol1 sol2 sol3 sol4 null null null null

Trying to convert to this DataFrame:

id name w x y z
1 AAAA val1 val2 val3 val4
1 AAAA valw valx valy valz
2 BBBB del1 del2 del3 del4
2 BBBB delw delx dely delz
3 CCCC sol1 sol2 sol3 sol4

What DataFrame transformation can help me with this WITHOUT converting to RDD?

New Coder
  • 499
  • 4
  • 22

1 Answers1

0

If the number of columns in your source dataframe remains constant or same then you could just perform two separate transforations of selecting the columns and renaming it and then doing Union of the two dataframe and you could get the output that you want.

//source Data creation
val df = Seq((1,"AAAA","val1","val2","val3","val4","valw","valx","valy","valz"),(2,"BBBB","del1","del2","del3","del4","delw","delx","dely","delz"),(3,"CCCC","sol1","sol2","sol3","sol4",null,null,null,null)).toDF("id","name","q1_w","q1_x","q1_y","q1_z","q2_w","q2_x","q2_y","q2_z")
//creating first dataframe  with required columns and renaming them
val df1 = df.select("id","name","q1_w","q1_x","q1_y","q1_z").filter($"q1_w".isNotNull).filter($"q1_x".isNotNull).filter($"q1_y".isNotNull).filter($"q1_z".isNotNull).withColumnRenamed("q1_w","w").withColumnRenamed("q1_x","x").withColumnRenamed("q1_y","y").withColumnRenamed("q1_z","z")
//creating second dataframe  with required columns and renaming them
val df2 = df.select("id","name","q2_w","q2_x","q2_y","q2_z").filter($"q2_w".isNotNull).filter($"q2_x".isNotNull).filter($"q2_y".isNotNull).filter($"q2_z".isNotNull).withColumnRenamed("q2_w","w").withColumnRenamed("q2_x","x").withColumnRenamed("q2_y","y").withColumnRenamed("q2_z","z")
//union first and the second dataframe and you would get the output that is required.
val finaldf = df1.union(df2)

You can see the output as below: enter image description here

Nikunj Kakadiya
  • 2,689
  • 2
  • 20
  • 35