This might be a stupid question. But just out of curiosity, can I do the following
var rawDF = spark.read
.option("mode", "FAILFAST")
.option("inferSchema", "true")
.option("header", "false")
.csv(hdfs_file)
val trainRatio = 0.8
val testRatio = 0.2
val Array(trainDF, testDF) = rawDF.randomSplit(Array(trainRatio, testRatio))
var temp : Dataset[DataFrame] = spark.emptyDataset[DataFrame]
val sampleDataset = Seq(trainDF,testDF).toDS()
temp = temp.union(sampleDataset)
In Intellij, I am getting error at .toDS()
line. Let's say if we could do this, then we could easily apply the map opeartion on Dataset to compute arbitary computation on the dataframes inside it. Am I wrong to think this? Does this have to do with encoders not being present for Dataframe?