This is working.
object FilesToDFDS {
case class Student(id: Int, name: String, dept:String)
def main(args: Array[String]): Unit = {
val ss = SparkSession.builder().appName("local").master("local[*]").getOrCreate()
import ss.implicits._
val path = "data.txt"
val rdd = ss.sparkContext.textFile(path).map(x => x.split(" ")).map(x => Student(x(0).toInt,x(1),x(2)))
val df = ss.read.format("csv").option("delimiter", " ").load(path).map(x => Student(x.getString(0).toInt ,x.getString(1),x.getString(2)))
val ds = ss.read.textFile(path).map(x => x.split(" ")).map(x => Student(x(0).toInt,x(1),x(2)))
val rddToDF = ss.sqlContext.createDataFrame(rdd)
}
}
But, if case class moved inside main, df
, ds
giving compilation error.
Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
And rddToDF
giving this compilation error No TypeTag available for Student
In this questions ques1 , ques2 people answered to move case class
outside main
. And this idea worked. But, why it is working only when case class
moved outside main method?