Output of file CSV should be with "JOB NAME & Current_Timestamp() "

Asked Mar 26 '22 at 15:01

Active Mar 26 '22 at 15:02

Viewed 88 times

I am reading some Table " V " data from S3-Input bucket and loading it to S3-Output bucket by using AWS Glue. I am using Pyspark as Script here.

Kindly could you pleased help me, how I can load output file to S3-Output Bucket with "Job-name&Current-timestamp()".

I mean, the output file should be appeared with Automic "JOB NAME & Timestamp " in output S3-Output bucket folder.

edited Mar 26 '22 at 15:02

asked Mar 26 '22 at 15:01

Venu

You need to write the custom function to read the partition data and write the file with the expected file_name pattern. as spark uses parallel processing, you need to take care of not writing with the same file name at the same location as this might result in failure or overwriting output files by other tasks in the same spark job. here is the link to a similar question - https://stackoverflow.com/questions/36107581/change-output-filename-prefix-for-dataframe-write – yogesh garud Mar 26 '22 at 20:03
I expected same for Pyspark with an example. Kindly help me here. – Venu Mar 28 '22 at 11:48

0 Answers0