How can I create a dataframe out of a nested JSON?

Question

So my initial schema looks like this:

root  
|-- database: String  
|-- table: String  
|-- data: struct (nullable = true)  
|    |-- element1: Int  
|    |-- element2: Char

The show() result has one data column that's ugly with [null,2,3] etc

What I want to do is to make the data struct into it's own dataframe so I can have the nested json's data spread out among columns but something like:

val dfNew = df.select("data") only really gets me the same gross column when I use show() instead of the multiple columns specified by the schema (element1, element2) etc.

Is there a way to do this?

Possible duplicate of [Querying Spark SQL DataFrame with complex types](http://stackoverflow.com/questions/28332494/querying-spark-sql-dataframe-with-complex-types) — zero323, Jul 18 '16 at 21:17
Check out [pandas.io.json.json_normalize](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.json.json_normalize.html). — Alicia Garcia-Raboso, Jul 18 '16 at 21:41

score 2 · Accepted Answer · 2016-07-18T22:04:25.250

2

Like this?

case class Data(element1: Int, element2: String)

val df = sqlContext.createDataFrame(sc.parallelize(Array(
        (1, Data(12312, "test"))))).toDF("i", "data")

df.select(col("data.element1"), col("data.element2"))

or this?

df.select(col("data.*"))

edited Jul 18 '16 at 22:04

answered Jul 18 '16 at 21:39

Along that, I'd like to be able to do it without specifying each column so I could just take all that are available. – Brady Auen Jul 18 '16 at 21:45
That second one looks like what I want. I tried this val dfdata2 = df.select(df.col("data.*")) And it didn't work, only one column. – Brady Auen Jul 18 '16 at 21:57
1

I couldn't get it to work for me, but this does `val dfdata2 = df.selectExpr("data.*")` and apparently this one too: `val dfdata = df.select("data.*")` – Brady Auen Jul 18 '16 at 22:12

How can I create a dataframe out of a nested JSON?

1 Answers1