Spark parquet nested value flatten

Question

I have parquet file. I loaded using Spark.And one of the value is nested key,value pairs. How do I flatten?

df.printSchema
root
|-- location: string (nullable = true)
|-- properties: string (nullable = true)


texas,{"key":{"key1":"value1","key2":"value2"}}

thanks,

@G G if this answers your question, can you accept it? – Ashish 21 hours ago — Ashish Awasthi, May 21 '16 at 08:37

score 1 · Answer 1 · edited May 03 '16 at 22:01

1

You can use explode on your dataframe and pass it a function that reads the JSON column using scala4s. Scala4s has easy parsing API, for your case it will look like:

val list = for {
  JArray(keys) <- parse(json) \\ "key"
  json @ JObject(key) <- keys
  JField("key1", JString(key1)) <- key
  JField("key2", JString(key2)) <- key
} yield {
  Seq(key1, key2)
}

This flattens your dataframe.

If you also want to add column for key, you can use withColumn after explode(keep the key also in the new column).

edited May 03 '16 at 22:01

Alberto Bonsanto

17,556
10
64
93

answered May 03 '16 at 13:12

Ashish Awasthi

1,302
11
23

@Alberto if this answers your question, can you accept it? – Ashish Awasthi May 20 '16 at 10:52

Spark parquet nested value flatten

1 Answers1

Linked