2

How to assign a predefined name to a parquet files in a AWS glue job ?

For example after my job runs a parquet file gets stored in the specific folder with a name like:

part-00000-fc95461f-00da-437a-9396-93c7ea473720.sn​appy.parquet, part-00000-tc95431f-00ds-437b-9396-93c7ea473720.sn​appy.parquet

I want the file to be stored in Predefined or a structured format like :

part-00000-12Jan2018.sn​appy.parquet, part-00000-13Jan2018.sn​appy.parquet

etc.

Kishore Bharathy
  • 441
  • 1
  • 3
  • 11

1 Answers1

0

Due to the nature of how spark works, we can't name the files to our liking at present. An alternate approach would be to rename the files as soon as they are written to s3/data lake. I found these answers to be helpful.

Sandeep Singh
  • 432
  • 6
  • 17