0

I have this code:

  def main(args: Array[String]): Unit = {

//  Creation of Spark Session
    implicit val sparkSession = SparkSession.builder().appName("seco_trial").getOrCreate()
    println("test")

    val df = sparkSession.read.format("csv").option("header", "true").load("D:/Userfiles/mir/Downloads/extract_sec_mav2.csv")

    println("test2")

the line where i create the val df gives me the following error:

2018-09-17 14:27:24 WARN FileStreamSink:66 - Error while looking for metadata directory. Exception in thread "main" java.io.IOException: No FileSystem for scheme: D at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) [...]

Lets exclude wrong path as a problem. My file correctly has the extension .csv

can you please tell me what is the problem? why cant i load the file?

kaileena
  • 119
  • 9
  • option("header", "true") value should given as boolean value not string value. Use option("header", true) instead – Praveen L Sep 17 '18 at 12:36
  • Don't know if it's just a c&p error, but the main method isn't closed off with a closing `}` in your code example. – James Whiteley Sep 17 '18 at 12:48
  • i will put that. but the error reamins :( – kaileena Sep 17 '18 at 12:50
  • 1
    Add master when creating SparkSession. For local you can use like this to take all available cores. `implicit val sparkSession = SparkSession.builder().appName("seco_trial").master("local[*]").getOrCreate()` – Praveen L Sep 17 '18 at 13:36
  • The only exception you have listed is related to path. Your exception is telling you it cannot find a filesystem with path `D:/...` What other problems do you have? – jacks Sep 17 '18 at 13:49
  • Does this help you - https://stackoverflow.com/questions/29704333/spark-load-csv-file-as-dataframe – jacks Sep 17 '18 at 13:52

1 Answers1

1

I am not sure why, but I have to add "fie:///" in front.

shim
  • 9,289
  • 12
  • 69
  • 108
kaileena
  • 119
  • 9
  • 1
    file:/// is let Spark know to read the data from local file system. If you don't want to specify the file:// change 'fs.defaultFS' value to 'file:///' , then by default Spark will look up data from local file system. – Lakshman Battini Sep 18 '18 at 00:38
  • can you please rephrase a bit? i think i kinda of understand but not sure. – kaileena Sep 18 '18 at 08:37