I'm upgrading one of my project from Spark 1.6 to Spark 2.0.1. The following code works for Spark 1.6 but it does not work for 2.0.1:
def count(df: DataFrame): DataFrame = {
val sqlContext = df.sqlContext
import sqlContext.implicits._
df.map { case Row(userId: String, itemId: String, count: Double) =>
(userId, itemId, count)
}.toDF("userId", "itemId", "count")
}
Here is the error message:
Error:(53, 12) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
df.map { case Row(userId: String, itemId: String, count: Double) =>
^
Error:(53, 12) not enough arguments for method map: (implicit evidence$7: org.apache.spark.sql.Encoder[(String, String, Double)])org.apache.spark.sql.Dataset[(String, String, Double)].
Unspecified value parameter evidence$7.
df.map { case Row(userId: String, itemId: String, count: Double) =>
^
I tried to using df.rdd.map
instead of df.map
, then got the following errors:
Error:(55, 7) value toDF is not a member of org.apache.spark.rdd.RDD[(String, String, Double)]
possible cause: maybe a semicolon is missing before `value toDF'?
}.toDF("userId", "itemId", "count")
^
How can I convert a RDD of Tuples to a dataframe in Spark 2.0?