Why is the error "Unable to find encoder for type stored in a Dataset" when encoding JSON using case classes?

Question

I've written spark job:

object SimpleApp {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
    val sc = new SparkContext(conf)
    val ctx = new org.apache.spark.sql.SQLContext(sc)
    import ctx.implicits._

    case class Person(age: Long, city: String, id: String, lname: String, name: String, sex: String)
    case class Person2(name: String, age: Long, city: String)

    val persons = ctx.read.json("/tmp/persons.json").as[Person]
    persons.printSchema()
  }
}

In IDE when I run the main function, 2 error occurs:

Error:(15, 67) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._  Support for serializing other types will be added in future releases.
    val persons = ctx.read.json("/tmp/persons.json").as[Person]
                                                                  ^

Error:(15, 67) not enough arguments for method as: (implicit evidence$1: org.apache.spark.sql.Encoder[Person])org.apache.spark.sql.Dataset[Person].
Unspecified value parameter evidence$1.
    val persons = ctx.read.json("/tmp/persons.json").as[Person]
                                                                  ^

but in Spark Shell I can run this job without any error. what is the problem?

score 37 · Accepted Answer · edited Aug 29 '16 at 19:37

37

The error message says that the Encoder is not able to take the Person case class.

Error:(15, 67) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._  Support for serializing other types will be added in future releases.

Move the declaration of the case class outside the scope of SimpleApp.

edited Aug 29 '16 at 19:37

Jacek Laskowski

72,696
27
242
420

answered Jan 11 '16 at 07:02

Developer

691
8
7

15

Why does scoping make any difference here? I am getting that error while using the REPL. – JackOrJones Sep 15 '16 at 17:39
I'm trying to understand the why the scope of case class is making the difference if you can point me to any resource that I can read and understand it will be of great help. Pretty new with scala implicit :( @jacek-laskowski – Sudev Ambadi Sep 01 '18 at 11:04
I don't think I'm capable of explaining why the solution works the way it does. I vaguely remember that it has nothing to do with implicits which are simply a mechanism to plug a code and think the code itself is the root cause. – Jacek Laskowski Sep 01 '18 at 11:16

score 4 · Answer 2 · answered Feb 28 '17 at 14:18

You have the same error if you add sqlContext.implicits._ and spark.implicits._ in SimpleApp (the order doesn't matter).

Removing one or the other will be the solution:

val spark = SparkSession
  .builder()
  .getOrCreate()

val sqlContext = spark.sqlContext
import sqlContext.implicits._ //sqlContext OR spark implicits
//import spark.implicits._ //sqlContext OR spark implicits

case class Person(age: Long, city: String)
val persons = ctx.read.json("/tmp/persons.json").as[Person]

Tested with Spark 2.1.0

The funny thing is if you add the same object implicits twice you will not have problems.

score 3 · Answer 3 · answered Aug 30 '18 at 07:23

3

@Milad Khajavi

Define Person case classes outside object SimpleApp. Also, add import sqlContext.implicits._ inside main() function.

answered Aug 30 '18 at 07:23

Santhoshm

64
3

Why is the error "Unable to find encoder for type stored in a Dataset" when encoding JSON using case classes?

3 Answers3

Linked