I have a dataframe (in Pyspark) that has one of the row values as a dictionary:
df.show()
And it looks like:
+----+---+-----------------------------+
|name|age|info |
+----+---+-----------------------------+
|rob |26 |{color: red, car: volkswagen}|
|evan|25 |{color: blue, car: mazda} |
+----+---+-----------------------------+
Based on the comments to give more:
df.printSchema()
The types are strings
root
|-- name: string (nullable = true)
|-- age: string (nullable = true)
|-- dict: string (nullable = true)
Is it possible to take the keys from the dictionary (color and car) and make them columns in the dataframe, and have the values be the rows for those columns?
Expected Result:
+----+---+-----------------------------+
|name|age|color |car |
+----+---+-----------------------------+
|rob |26 |red |volkswagen |
|evan|25 |blue |mazda |
+----+---+-----------------------------+
I didn't know I had to use df.withColumn() and somehow iterate through the dictionary to pick each one and then make a column out of it? I've tried to find some answers so far, but most were using Pandas, and not Spark, so I'm not sure if I can apply the same logic.