Good day,
I am new to Google Cloud Storage and recently have been assigned with a task to write data on a GCS bucket. I've done this before for S3 but not sure how to do it with GCS. I have found some sample codes here and there (like the one in this link or this one), but none of them are what I need. What has been provided to me:
bucket_name = {
google_storage_hmac_access_id = “SOMEKEY”
google_storage_hmac_secret = “SOMEKEY”
}
The approach in first link requires a json file for credentials which is not what I have in hand. So I used the approach in second link and added to following to my code:
spark_context._jsc.hadoopConfiguration().set(
'fs.gs.impl', 'com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem'
)
# This is required if you are using service account and set true,
spark_context._jsc.hadoopConfiguration().set(
'fs.gs.auth.service.account.enable', 'false'
)
# Following are required if you are using oAuth
spark_context._jsc.hadoopConfiguration().set(
'fs.gs.auth.client.id', gcs_key
)
spark_context._jsc.hadoopConfiguration().set(
'fs.gs.auth.client.secret', gcs_secret
)
where gcs_key
and gcs_secret
, are those provided to me to connect to that bucket. And this is set to be my path:
gs://bucket_name
When I try this, it ends up opening a login page for me to give access to GCS using an email address which is clearly not the case as well. I am looking for a working example on how to read/write data from a GS bucket using those credentials.
Note1: I have using the same access_id and secret to set up gsutil
and everything seems to be working fine.
Note2: I have included required jar files in spark jars directory (gcs-connector-hadoop3-latest.jar
).