Spark 2.0: how to convert a RDD of Tuples to DF

Question

I'm upgrading one of my project from Spark 1.6 to Spark 2.0.1. The following code works for Spark 1.6 but it does not work for 2.0.1:

   def count(df: DataFrame): DataFrame = {
    val sqlContext = df.sqlContext
    import sqlContext.implicits._

    df.map { case Row(userId: String, itemId: String, count: Double) =>
      (userId, itemId, count)
    }.toDF("userId", "itemId", "count")
   }

Here is the error message:

Error:(53, 12) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
    df.map { case Row(userId: String, itemId: String, count: Double) =>
           ^
Error:(53, 12) not enough arguments for method map: (implicit evidence$7: org.apache.spark.sql.Encoder[(String, String, Double)])org.apache.spark.sql.Dataset[(String, String, Double)].
Unspecified value parameter evidence$7.
    df.map { case Row(userId: String, itemId: String, count: Double) =>
       ^

I tried to using df.rdd.map instead of df.map, then got the following errors:

Error:(55, 7) value toDF is not a member of org.apache.spark.rdd.RDD[(String, String, Double)]
possible cause: maybe a semicolon is missing before `value toDF'?
    }.toDF("userId", "itemId", "count")
      ^

How can I convert a RDD of Tuples to a dataframe in Spark 2.0?

@rogue-one yes, tried changing `val sqlContext = df.sqlContext import sqlContext.implicits._` to `val spark = df.sparkSession import spark.implicits._`, but got the same error. — Rainfield, Jun 01 '17 at 03:33

score 0 · Answer 1 · answered Jun 01 '17 at 04:15

there is most likely a syntactical error somewhere else in your code because your map function seems to be written correctly while you are getting

Error:(53, 12) not enough arguments for method map: (implicit evidence$7: org.apache.spark.sql.Encoder[(String, String, Double)])org.apache.spark.sql.Dataset[(String, String, Double)]. Unspecified value parameter evidence$7

Your code work as is in my Spark shell, I have tested it.

Spark 2.0: how to convert a RDD of Tuples to DF

1 Answers1