When I write my dataframe to S3 using
df.write
.format("parquet")
.mode("overwrite")
.partitionBy("year", "month", "day", "hour", "gen", "client")
.option("compression", "gzip")
.save("s3://xxxx/yyyy")
I get the following in S3
year=2018
year=2019
but I would like to have this instead:
year=2018
year=2018_$folder$
year=2019
year=2019_$folder$
The scripts that are reading from that S3 location depend on the *_$folder$
entries, but I haven't found a way to configure spark/hadoop to generate them.
Any idea on what hadoop or spark configuration setting control the generation of *_$folder$
files?