I stored into my MongoDB some JSON documents. Each document looks like: {"businessData":{"capacity":{"fuelCapacity":282}, ..}
.
After reading all the documents, I want to export them as a valid JSON file. Specifically:
// Read JSON data from the DB
val df: DataFrame = MongoSpark.load(sparkSession, readConfig)
df.show
// Export into the file system
df.coalesce(1).write.mode(SaveMode.Overwrite).json("export.json")
// The show command only shows the .json values
+--------------------+
| businessData|
+--------------------+
|[[282],0,[true,20...|
|[[280],0,[true,20...|
|[[290],0,[true,20...|
|[[292],0,[true,20...|
|[[282],16,[true,2...|
+--------------------+
// export.json
{"businessData":{"capacity":{"fuelCapacity":282}, ..}
{"businessData":{"capacity":{"fuelCapacity":280}, ..}
{"businessData":{"capacity":{"fuelCapacity":290}, ..}
{"businessData":{"capacity":{"fuelCapacity":292}, ..}
{"businessData":{"capacity":{"fuelCapacity":282}, ..}
But when I export to the file system I want to combine these 5 rows into an Array and also add some custom meta-data. For example:
{
"metadata" : { "exportTime": "20/20/2020" , ...}
"allBusinessData" : [
{"businessData":{"capacity":{"fuelCapacity":282}, ..},
// all 5 rows from above
]
}
I have seen questions here and here advising against it. They also partially answer the question, as the don't add a custom json structure to the export.
Assuming however that this is only way I can proceed, how can I do it?
Many thanks!