I am trying to copy specific files from Bucket A to Bucket B. Bucket A is structured (directories), whereas Bucket B will have no directories. The challenge is that I need to name my files based on their original filename. Normally, I would create a custom filename policy and modify it as necessary. However, the only way I know to access the original filename is by passing through each element and pulling its metadata. How can I gain access to each element within TextIO.write?
I've considered creating a transform before TextIO.write that takes in a pcollection of elements and outputs a pcollection of KV where the key is the original filename and the value is the element (similar to this example). However, if I do that, how does my writer know how to write a KV?
I was able to get a hackey way of this working by using writedynamic and partitioning by each element's filename in a serializablefunction. Then I could pass through partitiontype to my filename policy and in turn, achieve my desired result. That being said, this seems far from efficient and wasn't designed for this since I don't actually need to partition anything.