It doesn't work because Row
is not a Product
type and createDataFrame
with as single RDD
argument is defined only for RDD[A]
where A <: Product
.
If you want to use RDD[Row]
you have to provide a schema as the second argument. If you think about it is should be obvious. Row
is just just a container of Any
and as such it doesn't provide enough information for schema inference.
Assuming this is the same RDD
as defined in your previous question then schema is easy to generate:
import org.apache.spark.sql.types._
import org.apache.spark.rdd.RD
val rowRdd: RDD[Row] = ???
val schema = StructType(
(1 to rowRdd.first.size).map(i => StructField(s"_$i", StringType, false))
)
val df = sqlContext.createDataFrame(rowRdd, schema)