If you want to handle data like this
val json = """{"f1": [{"f2": [1, 2, 3] }, {"f2": [4,5,6]}, {"f2": [7,8,9]}, {"f2": [10,11,12]}]}"""
The valid schema will be
val writerSchema = StructType(Seq(
StructField("f1", ArrayType(
StructType(Seq(
StructField("f2", ArrayType(LongType))
))
))))
root
|-- f1: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- f2: array (nullable = true)
| | | |-- element: long (containsNull = true)
You shouldn't be putting an ArrayType inside another ArrayType.
So lets suppose you have a dataframe inputDF :
inputDF.printSchema
root
|-- f1: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- f2: array (nullable = true)
| | | |-- element: long (containsNull = true)
inputDF.show(false)
+-------------------------------------------------------------------------------------------------------+
|f1 |
+-------------------------------------------------------------------------------------------------------+
|[[WrappedArray(1, 2, 3)], [WrappedArray(4, 5, 6)], [WrappedArray(7, 8, 9)], [WrappedArray(10, 11, 12)]]|
+-------------------------------------------------------------------------------------------------------+
To flatten this dataframe we can explode the array columns (f1 and f2):
First, flatten column 'f1'
val semiFlattenDF = inputDF.select(explode(col("f1"))).select(col("col.*"))
semiFlattenDF.printSchema
root
|-- f2: array (nullable = true)
| |-- element: long (containsNull = true)
semiFlattenDF.show
+------------+
| f2|
+------------+
| [1, 2, 3]|
| [4, 5, 6]|
| [7, 8, 9]|
|[10, 11, 12]|
+------------+
Now flatten column 'f2' and get the column name as 'value'
val fullyFlattenDF = semiFlattenDF.select(explode(col("f2")).as("value"))
So now the DataFrame is flattened:
fullyFlattenDF.printSchema
root
|-- value: long (nullable = true)
fullyFlattenDF.show
+-----+
|value|
+-----+
| 1|
| 2|
| 3|
| 4|
| 5|
| 6|
| 7|
| 8|
| 9|
| 10|
| 11|
| 12|
+-----+