I have an output RDD in my spark code written in python. I want to save it in Amazon S3 as gzipped file. I have tried following functions. The below function correctly saves the output rdd in s3 but not in gzipped format.
output_rdd.saveAsTextFile("s3://<name-of-bucket>/")
The below function returns error:: TypeError: saveAsHadoopFile() takes at least 3 arguments (3 given)
output_rdd.saveAsHadoopFile("s3://<name-of-bucket>/",
compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec"
)
Please guide me with the correct way to do this.