coming from the R
world I want to import an .csv into Spark (v.1.6.1) using the Scala Shell (./spark-shell
)
My .csv has a header and looks like
"col1","col2","col3"
1.4,"abc",91
1.3,"def",105
1.35,"gh1",104
Thanks.
coming from the R
world I want to import an .csv into Spark (v.1.6.1) using the Scala Shell (./spark-shell
)
My .csv has a header and looks like
"col1","col2","col3"
1.4,"abc",91
1.3,"def",105
1.35,"gh1",104
Thanks.
Spark 2.0+
Since the databricks/spark-csv
has been integrated into Spark, reading .CSVs is pretty straight forward using the SparkSession
val spark = .builder()
.master("local")
.appName("Word Count")
.getOrCreate()
val df = spark.read.option("header", true).csv(path)
Older versions
After restarting my spark-shell I figured it out by myself - may be of help for others:
After installing like described here and starting the spark-shell using ./spark-shell --packages com.databricks:spark-csv_2.11:1.4.0
:
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> val df = sqlContext.read.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load("/home/vb/opt/spark/data/mllib/mydata.csv")
scala> df.printSchema()
root
|-- col1: double (nullable = true)
|-- col2: string (nullable = true)
|-- col3: integer (nullable = true)