Get Filename of spark data frame write

Question

I'm using the below code to write the spark dataframe into s3 bucket.

spark_df. \
coalesce(1). \
write. \
option("header", "true"). \
mode("overwrite"). \
csv(bucket_name + "/" + bucket_path + "/csv")

Here I want to get the name of the file which is writing into s3 bukcet and want to use that file as part of later code section.

Specifying the filename when saving a DataFrame as a CSV

I have gone through the above question as per that we can't give file name while writing the dataframe into s3 bucket.

I'm thinking of iterating over the s3 bucket and get the file based on the latest time stamp(Mostly one file will be written at one time).

Could someone suggest me how to get the filename(using python) from s3 bucket based on the latest timestamp

score 0 · Answer 1 · answered Jun 22 '18 at 04:03

every partition in the job will create its own file. work on a directory-by-directory basis instead: all files created are the output. Maybe if you play with .repartition (1) you could merge everything down to one file -you could try some experiments there

Get Filename of spark data frame write

1 Answers1