I am trying to write a spark dataframe into google cloud storage. This dataframe has got some updates so I need a partition strategy. SO I need to write it into exact file in GCS.
i have Created a spark session as follows
.config("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")\
.config("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")\
.config("fs.gs.project.id", project_id)\
.config("fs.gs.auth.service.account.enable", "true")\
.config("fs.gs.auth.service.account.project.id",project_id)\
.config("fs.gs.auth.service.account.private.key.id",private_key_id)\
.config("fs.gs.auth.service.account.private.key",private_key)\
.config("fs.gs.auth.service.account.client.email",client_email)\
.config("fs.gs.auth.service.account.email",client_email)\
.config("fs.gs.auth.service.account.client.id",client_id)\
.config("fs.gs.auth.service.account.auth.uri",auth_uri)\
.config("fs.gs.auth.service.account.token.uri",token_uri)\
.config("fs.gs.auth.service.account.auth.provider.x509.cert.url",auth_provider_x509_cert_url)\
.config("fs.gs.auth.service.account.client_x509_cert_url",client_x509_cert_url)\
.config("spark.sql.avro.compression.codec", "deflate")\
.config("spark.sql.avro.deflate.level", "5")\
.getOrCreate())
and I am writing into GCS using
df.write.format(file_format).save('gs://'+bucket_name+path+'/'+table_name+'/file_name.avro')
now i see a file written in GCP is in path
gs://bucket_name/table_name/file_name.avro/--auto assigned name--.avro
what i am expecting is the file to be written like in hadoop and final result of data file to be
gs://bucket_name/table_name/file_name.avro
can any one help me achieve this?