Generic class to read csv in scala

Question

I am new to Scala and I am trying to build a framework that can read multiple types of csv files and all read operations will go through one class. For eg, I have two types of CSVs: Student and Professor and I am doing something like this.

abstract class Person
case class Student(name: String, major: String, marks: Double) extends Person
case class Professor(name: String, salary: Double) extends Person

my csv reader looks something like this

  private def readCsv[T: Encoder](location: String) = {
    spark
      .read
      .option("header", "true")
      .option("inferSchema", "true")
      .option("delimiter", ";")
      .csv(location)
      .as[T]
  }

def data:Dataset[Person](location) = readCsv[Person](location)

I am getting a compile-time error in the last line as No implicit arguments of Type: Encoder[Person]. Call to this method looks something like this:

val studentData = storage.data[Student]("Student.csv")

Is there any better way to achieve this?

The error is telling you that you need to supply an implicit argument. You can do this in three ways: (1) by defining an `implicit val` in scope, or (2) by defining an `implicit class`, or (3) by explicitly passing the mising `Encoder` argument in a secondary argument list. — Robin Green, Nov 05 '18 at 21:29
When you're getting an error, it's helpful if you include the error and any associated stack traces in the question. For more advice on how to write a good question, see: https://stackoverflow.com/help/how-to-ask — Geoffrey Wiseman, Nov 05 '18 at 21:31
I am getting a compile time error `No implicit arguments of Type: Encoder[Person]` at line `def data:Dataset[Person](location) = readCsv[Person](location)` — Harshad_Pardeshi, Nov 05 '18 at 21:56
Seems related - https://stackoverflow.com/a/41082540/864369 and https://stackoverflow.com/a/32454596/864369 — Dan W, Nov 05 '18 at 21:57

score 2 · Accepted Answer · answered Nov 06 '18 at 07:58

your ADT definition should probably be final/sealed else it's hard to derive Encoders for it.
IIRC Spark does not support Sum types sadly because there is no schema representation for it. A somewhat common hack is to represent Either[A, B] as (Option[A], Option[B]) but yeah it's a pain

Generic class to read csv in scala

1 Answers1