I have an existing case class having many fields case class output { userId : String, timeStamp: String, ... } And I am using it to generate header for a spark job like this. -------------------- userId | timeStamp| -------------------- 1 2324444444 2 2334445556 Now i want to add more columns to this and these column will be come from
map(attributeName, attributeValue) as attributeNames. So my question
is how can I add map to case class and then how can i use map key as
column value to generate dynamic columns. After this my final output
should be like ---------------------------------------------------- userId | timeStamp| attributeName1 | attributeName2 ---------------------------------------------------- 1 2324444444| | 2 2334445554| |
Asked
Active
Viewed 328 times
0

Satish Dalal
- 31
- 7
-
check this --> https://stackoverflow.com/questions/36869134/pyspark-converting-a-column-of-type-map-to-multiple-columns-in-a-dataframe – kavetiraviteja Aug 24 '20 at 17:04
-
Actually, I want to know that can i do it using case class or not. If yes, then how – Satish Dalal Aug 26 '20 at 04:28
1 Answers
1
you can do something like this
case class output {
userId : String,
timeStamp: String,
keyvalues: Map,
...
}
import spark.implicits._
import org.apache.spark.sql.functions._
val df = spark.read.textFile(inputlocation).as[output]
val keysDF = df.select(explode(map_keys($"keyvalues"))).distinct()
val keys = keysDF.collect().map(f=>f.get(0)).map(f=>col("keyvalues").getItem(f).as(f.toString))
df.select(col("userId") +: keyCols:_*)
or you can check this thread for other ways todo.

kavetiraviteja
- 2,058
- 1
- 15
- 35