How to implement a trait with a generic case class that creates a dataset in Scala

Question

I want to create a Scala trait that should be implemented with a case class T. The trait is simply to load data and transform it into a Spark Dataset of type T. I got the error that no encoder can be stored, which I think is because Scala does not know that T should be a case class. How can I tell the compiler that? I've seen somewhere that I should mention Product, but there is no such class defined.. Feel free to suggest other ways to do this!

I have the following code but it is not compiling with the error: 42: error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._ [INFO] .as[T]

I'm using Spark 1.6.1

Code:

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{Dataset, SQLContext}    

/**
      * A trait that moves data on Hadoop with Spark based on the location and the granularity of the data.
      */
    trait Agent[T] {
      /**
        * Load a Dataframe from the location and convert into a Dataset
        * @return Dataset[T]
        */
      protected def load(): Dataset[T] = {
        // Read in the data
        SparkContextKeeper.sqlContext.read
          .format("com.databricks.spark.csv")
          .load("/myfolder/" + location + "/2016/10/01/")
          .as[T]
      }
    }

possible duplicate of http://stackoverflow.com/questions/34715611/why-is-the-error-unable-to-find-encoder-for-type-stored-in-a-dataset-when-enco and http://stackoverflow.com/questions/38664972/why-is-unable-to-find-encoder-for-type-stored-in-a-dataset-when-creating-a-dat — Shankar, Nov 10 '16 at 15:48

score 7 · Accepted Answer · answered Nov 10 '16 at 16:03

Your code is missing 3 things:

Indeed, you must let compiler know that T is subclass of Product (the superclass of all Scala case classes and Tuples)
Compiler would also require the TypeTag and ClassTag of the actual case class. This is used implicitly by Spark to overcome type erasure
import of sqlContext.implicits._

Unfortunately, you can't add type parameters with context bounds in a trait, so the simplest workaround would be to use an abstract class instead:

import scala.reflect.runtime.universe.TypeTag
import scala.reflect.ClassTag

abstract class Agent[T <: Product : ClassTag : TypeTag] {
  protected def load(): Dataset[T] = { 
    val sqlContext: SQLContext = SparkContextKeeper.sqlContext
    import sqlContext.implicits._
    sqlContext.read.// same... 
  }
}

Obviously, this isn't equivalent to using a trait, and might suggest that this design isn't the best fit for the job. Another alternative is placing load in an object and moving the type parameter to the method:

object Agent {
  protected def load[T <: Product : ClassTag : TypeTag](): Dataset[T] = {
    // same...
  }
}

Which one is preferable is mostly up to where and how you're going to call load and what you're planning to do with the result.

Thanks for the solution, useful comment also on the design! I already tried your first and third remark, but the second one I needed to do the trick :) Btw, I think an abstract class is fine too as I will be inheriting Agent for every dataset I'm creating.. — Sparky, Nov 14 '16 at 08:52
import reflect.runtime.universe.TypeTag ? from: https://stackoverflow.com/questions/42285560/scala-compiler-says-no-typetag-available-for-t-in-method-using-generics — skjagini, Jan 10 '19 at 22:14

score 0 · Answer 2 · answered Nov 10 '16 at 16:01

0

You need to take two actions :

Add import sparkSession.implicits._ in your imports
Make your trait trait Agent[T <: Product]

answered Nov 10 '16 at 16:01

C4stor

8,355
6
29
47

How to implement a trait with a generic case class that creates a dataset in Scala

2 Answers2