2

I would like to save my generated RDD partitions using a custom filename, like: chunk0.gz, chunk1.gz, etc. Hence, I want them to be gzipped as well.

Using saveAsTextFile would result in a directory being created, with standard filenames part-00000.gz, etc.

fqPart.saveAsTextFile(outputFolder, classOf[GzipCodec])

How do I specify my own filenames? Would I have to iterate through the RDD partitions manually and write to the file, and then compress the resulting file as well?

Thanks in advance.

Laurens
  • 63
  • 1
  • 13
  • See http://stackoverflow.com/a/36108367/5344058: although that question refers to saving Dataframes (and not RDDs), the same answer holds. – Tzach Zohar Oct 27 '16 at 14:40

0 Answers0