0

I am trying to read a csv file to create a dataframe (https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html)

Using:

spark-1.3.1-bin-hadoop2.6
spark-csv_2.11-1.1.0

Code:

import org.apache.spark.sql.SQLContext
object test {
 def main(args: Array[String]) {
       val conf = new SparkConf().setAppName("test")
       val sc = new SparkContext(conf)
       val sqlContext = new SQLContext(sc)
       val df = sqlContext.csvFile("filename.csv")
       ...
 }
}

Error:

value csvFile is not a member of org.apache.spark.sql.SQLContext

I was trying to do as advised here: Spark - load CSV file as DataFrame?

But sqlContext doesn't seem to recognize the csvFile method of CsvContext class.

Any advise would be appreciated!

Community
  • 1
  • 1
user3376961
  • 867
  • 2
  • 12
  • 17

1 Answers1

0

I am also facing some issues with CSV(without Spark-CSV) but here is somethings that you can look at and check if they are OK.

  1. Build the Spark shell with the spark-csv library using sbt assembly.
  2. Add the spark-csv dependency to POM.XML of you maven project.
  3. use the load/save methods of Dataframe API.

SPARK-CSV GITHUB

refer the spark-csv github readme.md page and you will up and running :)

Chetandalal
  • 674
  • 1
  • 7
  • 18
  • **1&2.** I already have **"com.databricks" % "spark-csv_2.11" % "1.1.0"** in my sbt and when compiling it does not complain about missing dependencies. **3.** How will the load/save methods help me read the csv? As for readme.md page, I have already read that, it advises to use sqlContext.load which compiles but at runtime results in "filename.csv" is not a Parquet file. That is why I had followed the advice from here: http://stackoverflow.com/questions/29704333/how-to-read-csv-file-as-dataframe – user3376961 Jun 16 '15 at 18:43
  • If you want to to save CSV as parquet then use the dataframe created HashMap options = new HashMap(); options.put("header", "true"); options.put("path", dataFile); DataFrame df = sqlContext.load("com.databricks.spark.csv", options);df.save(,"parquet"); – Chetandalal Jun 16 '15 at 19:05