I am reading CSV files from the GCS bucket through PySpark in Anaconda. I am executing on Pyspark command prompt -
from pyspark import SparkContext
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
conf = SparkConf() \
.setMaster("local[2]") \
.setAppName("Test") \
.set("spark.jars", "C:\\path\to\jar\gcs-connector-hadoop-latest.jar")
sc = SparkContext.getOrCreate(conf=conf)
spark = SparkSession.builder \
.config(conf=sc.getConf()) \
.getOrCreate()
spark.read.json("gs://my-bucket")
The error I'm getting:
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: gs://my-bucket_spark_metadata
I searched on it but the solution all talked about how to change the file path. And as it's the GCS storage bucket path I'm referencing I can't change it! Please help.