Conversion of RDD to Dataframe

Question

I read a csv file to RDD and trying to convert it to DataFrame. But, it throughs error.

scala> rows.toDF()
<console>:34: error: value toDF is not a member of org.apache.spark.rdd.RDD[Array[String]]
              rows.toDF()

scala> rows.take(2)       
    Array[Array[String]] = Array(Array(1, 0, 3, "Braund, ...

What am I doing wrong?

does this help? http://stackoverflow.com/questions/29383578/how-to-convert-rdd-object-to-dataframe-in-spark — Rohit Chatterjee, Nov 11 '15 at 18:32
@ Rohit Chatterjee: First of thanks for quick response. I checked there most voted answer has same thing what I was trying. So, what am I missing? — PKM15, Nov 11 '15 at 18:37
In that answer their RDD is of type org.apache.spark.sql.Row, whereas yours is of type Array[String]. Can you convert it to a Row? If you can (and I feel like you should be able to), then importing sqlContext.implicits should work — Rohit Chatterjee, Nov 11 '15 at 18:42

eliasah · Accepted Answer · 2015-11-11T18:44:22.280

3

When you want to convert an RDD to a DataFrame, you'll need to create an SQLContext and import it's implicit functions like @zero323 suggested.

import sqlContext.implicits._
rows.toDF

In case your RDD is a RDD[Row] , the following will be needed

import org.apache.spark.sql.Row
rows.map(Row.fromSeq(_)).toDF

edited Nov 11 '15 at 18:44

answered Nov 11 '15 at 18:42

eliasah

39,588
11
124
154

Conversion of RDD to Dataframe

1 Answers1