1

I'm using the below code to write the spark dataframe into s3 bucket.

spark_df. \
coalesce(1). \
write. \
option("header", "true"). \
mode("overwrite"). \
csv(bucket_name + "/" + bucket_path + "/csv")

Here I want to get the name of the file which is writing into s3 bukcet and want to use that file as part of later code section.

Specifying the filename when saving a DataFrame as a CSV

I have gone through the above question as per that we can't give file name while writing the dataframe into s3 bucket.

I'm thinking of iterating over the s3 bucket and get the file based on the latest time stamp(Mostly one file will be written at one time).

Could someone suggest me how to get the filename(using python) from s3 bucket based on the latest timestamp

data_addict
  • 816
  • 3
  • 15
  • 32

1 Answers1

0

every partition in the job will create its own file. work on a directory-by-directory basis instead: all files created are the output. Maybe if you play with .repartition (1) you could merge everything down to one file -you could try some experiments there

stevel
  • 12,567
  • 1
  • 39
  • 50