2

I am new to Scala and I am trying to build a framework that can read multiple types of csv files and all read operations will go through one class. For eg, I have two types of CSVs: Student and Professor and I am doing something like this.

abstract class Person
case class Student(name: String, major: String, marks: Double) extends Person
case class Professor(name: String, salary: Double) extends Person

my csv reader looks something like this

  private def readCsv[T: Encoder](location: String) = {
    spark
      .read
      .option("header", "true")
      .option("inferSchema", "true")
      .option("delimiter", ";")
      .csv(location)
      .as[T]
  }

def data:Dataset[Person](location) = readCsv[Person](location)

I am getting a compile-time error in the last line as No implicit arguments of Type: Encoder[Person]. Call to this method looks something like this:

val studentData = storage.data[Student]("Student.csv")

Is there any better way to achieve this?

Eugene Lisitsky
  • 12,113
  • 5
  • 38
  • 59
  • The error is telling you that you need to supply an implicit argument. You can do this in three ways: (1) by defining an `implicit val` in scope, or (2) by defining an `implicit class`, or (3) by explicitly passing the mising `Encoder` argument in a secondary argument list. – Robin Green Nov 05 '18 at 21:29
  • When you're getting an error, it's helpful if you include the error and any associated stack traces in the question. For more advice on how to write a good question, see: https://stackoverflow.com/help/how-to-ask – Geoffrey Wiseman Nov 05 '18 at 21:31
  • I am getting a compile time error `No implicit arguments of Type: Encoder[Person]` at line `def data:Dataset[Person](location) = readCsv[Person](location)` – Harshad_Pardeshi Nov 05 '18 at 21:56
  • Seems related - https://stackoverflow.com/a/41082540/864369 and https://stackoverflow.com/a/32454596/864369 – Dan W Nov 05 '18 at 21:57

1 Answers1

2
  1. your ADT definition should probably be final/sealed else it's hard to derive Encoders for it.
  2. IIRC Spark does not support Sum types sadly because there is no schema representation for it. A somewhat common hack is to represent Either[A, B] as (Option[A], Option[B]) but yeah it's a pain
Dominic Egger
  • 1,016
  • 5
  • 7