51

I am doing some testing for spark using scala. We usually read json files which needs to be manipulated like the following example:

test.json:

{"a":1,"b":[2,3]}
val test = sqlContext.read.json("test.json")

How can I convert it to the following format:

{"a":1,"b":2}
{"a":1,"b":3}
gsamaras
  • 71,951
  • 46
  • 188
  • 305
Nir Ben Yaacov
  • 1,182
  • 2
  • 17
  • 33

1 Answers1

93

You can use explode function:

scala> import org.apache.spark.sql.functions.explode
import org.apache.spark.sql.functions.explode


scala> val test = sqlContext.read.json(sc.parallelize(Seq("""{"a":1,"b":[2,3]}""")))
test: org.apache.spark.sql.DataFrame = [a: bigint, b: array<bigint>]

scala> test.printSchema
root
 |-- a: long (nullable = true)
 |-- b: array (nullable = true)
 |    |-- element: long (containsNull = true)

scala> val flattened = test.withColumn("b", explode($"b"))
flattened: org.apache.spark.sql.DataFrame = [a: bigint, b: bigint]

scala> flattened.printSchema
root
 |-- a: long (nullable = true)
 |-- b: long (nullable = true)

scala> flattened.show
+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  1|  3|
+---+---+
zero323
  • 322,348
  • 103
  • 959
  • 935
  • thanks, that works great in the shell. however, when i try this in Intellij I get an error when trying to referencing column b with $"b". do you know how this can be resolved? – Nir Ben Yaacov Oct 02 '15 at 14:10
  • 11
    Try [`import sqlContext.implicits._`](https://github.com/apache/spark/blob/8ecba3e86e53834413da8b4299f5791545cae12e/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L349). You can also use `org.apache.spark.sql.functions.col ` and apply on a `DataFrame (df("b"))`. – zero323 Oct 02 '15 at 14:32
  • if sqlContext.implicits._ doesn't work for you try `import spark.implicits._` within scope. You may also need `import org.apache.spark.sql.functions.explode` – JMess Aug 30 '18 at 17:57