I have a bunch of parquet data in a structure something like col1=1/col2=2/col3=3/part-00000-33b48309-0442-4e86-870f-f3070268107f-c000.snappy.parquet
I've read up on what I could find, and it seems pretty clear what each part of the file name means - part-00000
increments per file in the partition, c000
is something to do with other part of output configuration, and the rest is a UUID to prevent collisions during parallel writes.
I'm wondering - what parts of the filename can I change, or get rid of? Specifically, is it safe to just remove the UUID?
(The larger motivation is that I need to add data over time to an existing store, but want to maintain N files per partition, and since you can't overwrite the files you're reading, I need to stage the new files and then copy them over, and this would be easier with known file names)