Working only when case class defined outside main method to create Dataset[case class] or Dataframe[case class]

Question

This is working.

object FilesToDFDS {
    case class Student(id: Int, name: String, dept:String)
    def main(args: Array[String]): Unit = {
        val ss = SparkSession.builder().appName("local").master("local[*]").getOrCreate()
        import ss.implicits._

        val path = "data.txt"
        val rdd = ss.sparkContext.textFile(path).map(x => x.split(" ")).map(x => Student(x(0).toInt,x(1),x(2)))
        val df = ss.read.format("csv").option("delimiter", " ").load(path).map(x => Student(x.getString(0).toInt ,x.getString(1),x.getString(2)))
        val ds = ss.read.textFile(path).map(x => x.split(" ")).map(x => Student(x(0).toInt,x(1),x(2)))

        val rddToDF = ss.sqlContext.createDataFrame(rdd)

    }
}

But, if case class moved inside main, df, ds giving compilation error.

Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.

And rddToDF giving this compilation error No TypeTag available for Student

In this questions ques1 , ques2 people answered to move case class outside main. And this idea worked. But, why it is working only when case class moved outside main method?

https://stackoverflow.com/questions/74362186/why-is-the-spark-implicits-import-not-helping-with-encoder-derivation-inside-a — Dmytro Mitin, Apr 17 '23 at 14:44

score 0 · Answer 1 · answered May 19 '20 at 02:37

I believe if a case class is defined within another class, then it needs an instance of that class to work properly. In this case, if you put your Student class within the main, then you would need something like FilesToDFDS.Student to make it work.

Working only when case class defined outside main method to create Dataset[case class] or Dataframe[case class]

1 Answers1