0

I'm using this chunk of code to save my Dataframe on specific S3 bucket location:

df.coalesce(1).write\
        .format("csv")\
        .mode("append")\
        .save(f"s3://{bucket_output}/{dirname}/{filename}", header=True, nullValue = '\u0000', emptyValue = '\u0000')

I couldn't find anywhere in the web, information about changing the localization and the name of such a .csv file, using Python from a Glue job. Now, the csv file is saved not as a filename file, but in the directory named filename and the name of this csv is part-(some_numbers).csv.

enter image description here

How to get around it? Any move operation on S3 bucket or something?

Dawid_K
  • 141
  • 1
  • 1
  • 10
  • Does this answer your question? [Specifying the filename when saving a DataFrame as a CSV](https://stackoverflow.com/questions/41990086/specifying-the-filename-when-saving-a-dataframe-as-a-csv) – boyangeor Jul 18 '23 at 05:05
  • Answer you provided is based on Scala, not Python. – Dawid_K Jul 18 '23 at 08:01
  • The point is that you cannot set the file name via Spark, it has to be renamed. How to rename it, that depends on the underlying storage system. Starting point for S3: [link](https://stackoverflow.com/questions/32501995/boto3-s3-renaming-an-object-using-copy-object) – boyangeor Jul 18 '23 at 09:10

0 Answers0