0

I'm upgrading one of my project from Spark 1.6 to Spark 2.0.1. The following code works for Spark 1.6 but it does not work for 2.0.1:

   def count(df: DataFrame): DataFrame = {
    val sqlContext = df.sqlContext
    import sqlContext.implicits._

    df.map { case Row(userId: String, itemId: String, count: Double) =>
      (userId, itemId, count)
    }.toDF("userId", "itemId", "count")
   }

Here is the error message:

Error:(53, 12) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
    df.map { case Row(userId: String, itemId: String, count: Double) =>
           ^
Error:(53, 12) not enough arguments for method map: (implicit evidence$7: org.apache.spark.sql.Encoder[(String, String, Double)])org.apache.spark.sql.Dataset[(String, String, Double)].
Unspecified value parameter evidence$7.
    df.map { case Row(userId: String, itemId: String, count: Double) =>
       ^

I tried to using df.rdd.map instead of df.map, then got the following errors:

Error:(55, 7) value toDF is not a member of org.apache.spark.rdd.RDD[(String, String, Double)]
possible cause: maybe a semicolon is missing before `value toDF'?
    }.toDF("userId", "itemId", "count")
      ^

How can I convert a RDD of Tuples to a dataframe in Spark 2.0?

Rainfield
  • 1,172
  • 2
  • 14
  • 29
  • did you try importing `importing spark.implicits._ `? – rogue-one Jun 01 '17 at 03:28
  • @rogue-one yes, tried changing `val sqlContext = df.sqlContext import sqlContext.implicits._` to `val spark = df.sparkSession import spark.implicits._`, but got the same error. – Rainfield Jun 01 '17 at 03:33

1 Answers1

0

there is most likely a syntactical error somewhere else in your code because your map function seems to be written correctly while you are getting

Error:(53, 12) not enough arguments for method map: (implicit evidence$7: org.apache.spark.sql.Encoder[(String, String, Double)])org.apache.spark.sql.Dataset[(String, String, Double)]. Unspecified value parameter evidence$7

Your code work as is in my Spark shell, I have tested it.

HaoYuan
  • 324
  • 2
  • 6