2

I read a csv file to RDD and trying to convert it to DataFrame. But, it throughs error.

scala> rows.toDF()
<console>:34: error: value toDF is not a member of org.apache.spark.rdd.RDD[Array[String]]
              rows.toDF()

scala> rows.take(2)       
    Array[Array[String]] = Array(Array(1, 0, 3, "Braund, ...            

What am I doing wrong?

eliasah
  • 39,588
  • 11
  • 124
  • 154
PKM15
  • 109
  • 1
  • 8
  • 1
    does this help? http://stackoverflow.com/questions/29383578/how-to-convert-rdd-object-to-dataframe-in-spark – Rohit Chatterjee Nov 11 '15 at 18:32
  • @ Rohit Chatterjee: First of thanks for quick response. I checked there most voted answer has same thing what I was trying. So, what am I missing? – PKM15 Nov 11 '15 at 18:37
  • In that answer their RDD is of type org.apache.spark.sql.Row, whereas yours is of type Array[String]. Can you convert it to a Row? If you can (and I feel like you should be able to), then importing sqlContext.implicits should work – Rohit Chatterjee Nov 11 '15 at 18:42
  • Thank you!. I will try to do that – PKM15 Nov 11 '15 at 18:43

1 Answers1

3

When you want to convert an RDD to a DataFrame, you'll need to create an SQLContext and import it's implicit functions like @zero323 suggested.

import sqlContext.implicits._
rows.toDF

In case your RDD is a RDD[Row] , the following will be needed

import org.apache.spark.sql.Row
rows.map(Row.fromSeq(_)).toDF
eliasah
  • 39,588
  • 11
  • 124
  • 154