TL;DR: When I dump a Spark DataFrame
as json, I always end up with something like
{"key1": "v11", "key2": "v21"}
{"key1": "v12", "key2": "v22"}
{"key1": "v13", "key2": "v23"}
which is invalid json. I can manually edit the dumped file to get something I can parse:
[
{"key1": "v11", "key2": "v21"},
{"key1": "v12", "key2": "v22"},
{"key1": "v13", "key2": "v23"}
]
but I'm pretty sure I'm missing something that would let me avoid this manual edit. I just don't now what.
More details:
I have a org.apache.spark.sql.DataFrame
and I try dumping it to json using the following code:
myDataFrame.write.json("file.json")
I also tried with:
myDataFrame.toJSON.saveAsTextFile("file.json")
In both case it ends up dumping correctly each row, but it's missing a separating comma between the rows, and as well as square brackets. Consequently, when I subsequently try to parse this file the parser I use insults me and then fails.
I would be grateful to learn how I can dump valid json. (reading the documentation of the DataFrameWriter didn't provided me with any interesting hints.)