Flattening Rows in Spark

Question

I am doing some testing for spark using scala. We usually read json files which needs to be manipulated like the following example:

test.json:

{"a":1,"b":[2,3]}

val test = sqlContext.read.json("test.json")

How can I convert it to the following format:

{"a":1,"b":2}
{"a":1,"b":3}

zero323 · Accepted Answer · 2015-10-02T13:45:00.210

93

You can use explode function:

scala> import org.apache.spark.sql.functions.explode
import org.apache.spark.sql.functions.explode


scala> val test = sqlContext.read.json(sc.parallelize(Seq("""{"a":1,"b":[2,3]}""")))
test: org.apache.spark.sql.DataFrame = [a: bigint, b: array<bigint>]

scala> test.printSchema
root
 |-- a: long (nullable = true)
 |-- b: array (nullable = true)
 |    |-- element: long (containsNull = true)

scala> val flattened = test.withColumn("b", explode($"b"))
flattened: org.apache.spark.sql.DataFrame = [a: bigint, b: bigint]

scala> flattened.printSchema
root
 |-- a: long (nullable = true)
 |-- b: long (nullable = true)

scala> flattened.show
+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  1|  3|
+---+---+

edited Oct 02 '15 at 13:45

answered Oct 02 '15 at 12:12

zero323

322,348
103
959
935

thanks, that works great in the shell. however, when i try this in Intellij I get an error when trying to referencing column b with $"b". do you know how this can be resolved? – Nir Ben Yaacov Oct 02 '15 at 14:10
11

Try [`import sqlContext.implicits._`](https://github.com/apache/spark/blob/8ecba3e86e53834413da8b4299f5791545cae12e/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L349). You can also use `org.apache.spark.sql.functions.col ` and apply on a `DataFrame (df("b"))`. – zero323 Oct 02 '15 at 14:32
if sqlContext.implicits._ doesn't work for you try `import spark.implicits._` within scope. You may also need `import org.apache.spark.sql.functions.explode` – JMess Aug 30 '18 at 17:57

Flattening Rows in Spark

1 Answers1

Linked

Related