create view for two different dataframe in scala spark

Question

I have a code snippet that will read a Json array of the file path and then union the output and gives me two different tables. So I want to create two different createOrReplaceview(name) for those two tables and the name will be available in json array like below:

      {
        "source": [
            {
                "name": "testPersons",
                "data": [
                "E:\\dataset\\2020-05-01\\",
                "E:\\dataset\\2020-05-02\\"
                ],
                "type": "json"
            },
            {
                "name": "testPets",
                "data": [
                "E:\\dataset\\2020-05-01\\078\\",
                "E:\\dataset\\2020-05-02\\078\\"
                ],
                "type": "json"
            }
        ]
    }

My output:

testPersons
        +---+------+
        |name  |age|
        +---+------+
        |John  |24 |
        |Cammy |20 |
        |Britto|30 |
        |George|23 |
        |Mikle |15 |
        +---+------+
 testPets
        +---+------+
        |name  |age|
        +---+------+
        |piku  |2  |
        |jimmy |3  |
        |rapido|1  |
        +---+------+

Above is my Output and Json array my code iterate through each array and read the data section and read the data. But how to change my below code to create a temp view for each output table. for example i want to create .createOrReplaceTempView(testPersons) and .createOrReplaceTempView(testPets) view name as per in Json array

if (dataArr(counter)("type").value.toString() == "json") {
          val name = dataArr(counter)("name").value.toString()
          val dataPath = dataArr(counter)("data").arr
          val input = dataPath.map(item => {
            val rdd = spark.sparkContext.wholeTextFiles(item.str).map(i => "[" + i._2.replaceAll("\\}.*\n{0,}.*\\{", "},{") + "]")
            spark
              .read
              .schema(Schema.getSchema(name))
              .option("multiLine", true)
              .json(rdd)
          })
          val emptyDF = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], Schema.getSchema(name))
          val finalDF = input.foldLeft(emptyDF)((x, y) => x.union(y))
          finalDF.show()

Expected output:

 spark.sql("SELECT * FROM testPersons").show()
 spark.sql("SELECT * FROM testPets").show()

It should give me the table for each one.

kfkhalili · Accepted Answer · 2020-09-03T14:47:16.607

Since you already have your data wrangled into shape and have your rows in DataFrames and simply want to access them as temporary views, I suppose you are looking for the function(s):

They can be invoked from a DataFrame/Dataset.

df.createOrReplaceGlobalTempView("testPersons")
spark.sql("SELECT * FROM global_temp.testPersons").show()

df.createOrReplaceTempView("testPersons")
spark.sql("SELECT * FROM testPersons").show()

For an explanation about the difference between the two, you can take a look at this question.

If you are trying to dynamically read the JSON, get the files in data into DataFrames and then save them into their own table.

import net.liftweb.json._
import net.liftweb.json.DefaultFormats

case class Source(name: String, data: List[String], `type`: String)

val file = scala.io.Source.fromFile("path/to/your/file").mkString
implicit val formats: DefaultFormats.type = DefaultFormats
val json = parse(file)
val sourceList = (json \ "source").children
for (source <- sourceList) {
  val s = source.extract[Source]
  val df = s.data.map(d => spark.read(d)).reduce(_ union _)
  df.createOrReplaceTempView(s.name)
}

So do you have one single DataFrame with persons and pets (and other groups) and you want to create multiple tables out of it? Did I understand you correctly? — kfkhalili, Sep 03 '20 at 10:35

create view for two different dataframe in scala spark

1 Answers1