Reading csv file from bucket through Pyspark in Anaconda

Asked Jan 28 '20 at 09:58

Active Jan 28 '20 at 10:16

Viewed 178 times

I am reading CSV files from the GCS bucket through PySpark in Anaconda. I am executing on Pyspark command prompt -

from pyspark import SparkContext
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession

conf = SparkConf() \
    .setMaster("local[2]") \
    .setAppName("Test") \
    .set("spark.jars", "C:\\path\to\jar\gcs-connector-hadoop-latest.jar") 

sc = SparkContext.getOrCreate(conf=conf)

spark = SparkSession.builder \
    .config(conf=sc.getConf()) \
    .getOrCreate()

spark.read.json("gs://my-bucket")

The error I'm getting:

java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: gs://my-bucket_spark_metadata

I searched on it but the solution all talked about how to change the file path. And as it's the GCS storage bucket path I'm referencing I can't change it! Please help.

Spark 2.0: Relative path in absolute URI (spark-warehouse)

edited Jan 28 '20 at 10:16

Willi Mentzel

27,862
20
113
121

asked Jan 28 '20 at 09:58

sopana

2

Specify the absolute path like gs://my-bucket/filename.json – Ghost Jan 28 '20 at 10:07
You can use this as well - gs://my-bucket/*.json to read multiple jsons. – Aditya Vikram Singh Nov 19 '20 at 16:19

Reading csv file from bucket through Pyspark in Anaconda

0 Answers0