I'm looking for a way to access the unique part(s) of the parquet filename when saving a Spark DataFrame as Parquet with PySpark.
Just read in Change output filename prefix for DataFrame.write() that changing the output filename prefix for DataFrame.write() is not possible, though I like to know if there is a way to access the values used in RecordWriter to build up the filename.
I had a look at the a source code, and saw that it's configuration.get("spark.sql.sources.writeJobUUID"), does this property gets initialized earlier, and is it also accessible through PySpark?
I'd like to use it for logging purposes, to match a specific Spark job to the parquet files written (so I can e.g. remove all output by a specific job in different output partitions).