I have a code snippet that will read a Json array of the file path and then union the output and gives me two different tables. So I want to create two different createOrReplaceview(name) for those two tables and the name will be available in json array like below:
{
"source": [
{
"name": "testPersons",
"data": [
"E:\\dataset\\2020-05-01\\",
"E:\\dataset\\2020-05-02\\"
],
"type": "json"
},
{
"name": "testPets",
"data": [
"E:\\dataset\\2020-05-01\\078\\",
"E:\\dataset\\2020-05-02\\078\\"
],
"type": "json"
}
]
}
My output:
testPersons
+---+------+
|name |age|
+---+------+
|John |24 |
|Cammy |20 |
|Britto|30 |
|George|23 |
|Mikle |15 |
+---+------+
testPets
+---+------+
|name |age|
+---+------+
|piku |2 |
|jimmy |3 |
|rapido|1 |
+---+------+
Above is my Output and Json array my code iterate through each array and read the data section and read the data.
But how to change my below code to create a temp view for each output table.
for example i want to create .createOrReplaceTempView(testPersons)
and .createOrReplaceTempView(testPets)
view name as per in Json array
if (dataArr(counter)("type").value.toString() == "json") {
val name = dataArr(counter)("name").value.toString()
val dataPath = dataArr(counter)("data").arr
val input = dataPath.map(item => {
val rdd = spark.sparkContext.wholeTextFiles(item.str).map(i => "[" + i._2.replaceAll("\\}.*\n{0,}.*\\{", "},{") + "]")
spark
.read
.schema(Schema.getSchema(name))
.option("multiLine", true)
.json(rdd)
})
val emptyDF = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], Schema.getSchema(name))
val finalDF = input.foldLeft(emptyDF)((x, y) => x.union(y))
finalDF.show()
Expected output:
spark.sql("SELECT * FROM testPersons").show()
spark.sql("SELECT * FROM testPets").show()
It should give me the table for each one.