I am writing data into s3 bucket and creating parquet files using pyspark . MY bucket structure looks like below:
s3a://rootfolder/subfolder/table/
subfolder and table these two folders should be created at run time if folders do not exist , and if folders exist parquet files should inside folder table .
when I am running pyspark program from local machine it creates extra folder with _$folder$ (like table_$folder$
) but if same program is run from emr it creates with _SUCCESS .
writing into s3: (pyspark program)
data.write.parquet("s3a://rootfolder/sub_folder/table/", mode="overwrite")
is there way that creates only folder in s3 if do not exist and do not create folders like table_$folder$ or with _SUCCESS .