I am coding in the Jupyter notebook of GCP Dataproc. Below is the code I am having:
spark = SparkSession.builder.master("yarn") \
.appName('1.2. BigQuery Storage & Spark SQL - Python') \
.config('spark.jars','gs://dev-pysparkfiles/Spark-Bigquery-Connector-2.12-0.24.2.jar') \
.config('spark.jars.packages','com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.24.2') \
.getOrCreate()
df = spark.read.format("com.google.cloud.spark.bigquery") \
.option("materializationDataset", "ABC_HK_STG_TEMP_SIT") \
.option("materializationExpirationTimeInMinutes", "1440") \
.option("query", sql) \
.load()
I tried with changing com.google.cloud.spark.bigquery
to bigquery
. But It is still giving me the error:
java.lang.ClassNotFoundException: Failed to find data source: bigquery
I also removed the spark.jars.packages
config while creating the spark session as I read in another stackoverflow answers. But I am still getting the same error.