0

I am reading CSV files from the GCS bucket through PySpark in Anaconda. I am executing on Pyspark command prompt -

from pyspark import SparkContext
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession

conf = SparkConf() \
    .setMaster("local[2]") \
    .setAppName("Test") \
    .set("spark.jars", "C:\\path\to\jar\gcs-connector-hadoop-latest.jar") 

sc = SparkContext.getOrCreate(conf=conf)

spark = SparkSession.builder \
    .config(conf=sc.getConf()) \
    .getOrCreate()

spark.read.json("gs://my-bucket")

The error I'm getting:

java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: gs://my-bucket_spark_metadata

I searched on it but the solution all talked about how to change the file path. And as it's the GCS storage bucket path I'm referencing I can't change it! Please help.

Spark 2.0: Relative path in absolute URI (spark-warehouse)

Willi Mentzel
  • 27,862
  • 20
  • 113
  • 121
sopana
  • 365
  • 1
  • 5
  • 15

0 Answers0