I am trying to write a generic method which can create Dataset, with client supplying data file name, fileformat, and 'something' which can represent input case class for schema. I tried with:
def dataSetFromFileAndCaseClass[T](spark: SparkSession, fileName: String, schema: ClassTag[T], fileFormat: String) = {
import spark.implicits._
fileFormat match {
case "csv" => spark.read.csv(fileName).as[schema]
case "json" => spark.read.json(fileName).as[schema]
case _ => throw new Exception("File format not supported")
}
}
...and it doesn't work as I expected :).
'as' is defined like:
def as[U : Encoder]: Dataset[U] = Dataset[U](sparkSession, logicalPlan)
So what I understand, 'as' is expecting an implicit Encoder for whatever case class client would like to provide.
So schema must somehow bring in Encoder for whatever case class client would call dataSetFromFileAndCaseClass method to work for.
How do modify 'dataSetFromFileAndCaseClass' signature to get it working?