1

I am trying to read csv file which is stored in GCS using spark, I have a simple spark java project which does nothing but reading a csv. the following code are used in it.

SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("Hello world");
    SparkSession sparkSession = SparkSession.builder().config(conf).getOrCreate();

    Dataset<Row> dataset = sparkSession.read().option("header", true).option("sep", "" + ",").option("delimiter", "\"").csv("gs://abc/WDC_age.csv");

but it throws an error which says:

Exception in thread "main" java.io.IOException: No FileSystem for scheme: gs

can anyone help me in this? I just want to read csv from GCS using spark.

Thanks In Advance :)

  • Does this answer your question? [Reading from google storage gs:// filesystem from local spark instance](https://stackoverflow.com/questions/40716055/reading-from-google-storage-gs-filesystem-from-local-spark-instance) – Saif Ahmad Dec 26 '22 at 07:07

2 Answers2

1

In my case, i just added the following dependency on my pom.xml file:

<dependency>
        <groupId>com.google.cloud.bigdataoss</groupId>
        <artifactId>gcs-connector</artifactId>
        <version>hadoop3-2.2.4</version>
    </dependency>

and it work for me.

0

No FileSystem for scheme: gs indicates Spark couldn't find the GCS connector. I guess you are not running in a Dataproc cluster, you might need to install the connector by yourself https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage

Dagang
  • 24,586
  • 26
  • 88
  • 133