1

I am trying to load a xls/xlsx file from server

With this code https://github.com/crealytics/spark-excel

SparkConf sparkConf = new SparkConf();
SparkContext sparkContext = null;

sparkContext = new SparkContext("local", "234", sparkConf);
SparkSession sparkSession = 
SparkSession.builder().sparkContext(sparkContext).getOrCreate();

SQLContext sqlContext = sparkSession.sqlContext().newSession();
Dataset<Row> dframe = sqlContext.read().format("com.crealytics.spark.excel").option("timestampFormat", "yyyy-mmm-dd HH:mm:ss").option("dataAddress", "'My Sheet'!A1").option("useHeader", "true").option("treatEmptyValuesAsNulls", "false").option("inferSchema", "true")
            .load("/home/test/myfile/sample.xlsx"); // local path

This code perfectly works on local file

"/home/test/myfile/sample.xlsx"

How can i read files on server with path like this

"http://10.0.0.1:8080/serverFiles/test.xlsx"

I treid replacing the code with Server url above and got this error :

 Exception in thread "main" java.io.IOException: No FileSystem for scheme: http
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)

Can spark read xlsx/xls file directly from server URL ? [not for CSV]

KishanCS
  • 1,357
  • 1
  • 19
  • 38

1 Answers1

0

You cannot directly use HTTP paths in your Spark Context.

Refer the this SO-question

Sathiyan S
  • 1,013
  • 6
  • 13
  • How can i do that in java .For xlsx file .I dont want to use CSV – KishanCS Jan 17 '19 at 12:22
  • here you go! refer it https://stackoverflow.com/questions/6259339/how-to-read-a-text-file-directly-from-internet-using-java – Sathiyan S Jan 17 '19 at 12:23
  • This gives a scanner object which again cannot be loaded using sqlContext – KishanCS Jan 17 '19 at 12:31
  • you can create a local file/String with this scanner and use parallelize to create RDD and create DF from it... – Sathiyan S Jan 17 '19 at 12:54
  • That is what i am doing and want to avoid it. Files has to be read using URL directly by spark . Not reading file object from url writting it in local file and reading it. I need to know weather there is a way in which i can read URL using sqlContext(This is what i asked in my question). – KishanCS Jan 17 '19 at 13:00
  • 1
    this not possible as far I know.. spark will accept all HDFS compliance data source.. but not from HTTP. That's why I suggested to get the file content to your driver and distribute it.. – Sathiyan S Jan 17 '19 at 13:06
  • Welcome.... Keep us posted if you find a way to deal with it other than the suggested one. Will be helpful..! – Sathiyan S Jan 17 '19 at 13:08