I'm trying to read a bunch of gzipped CSV files from S3 via PySpark. Normally textFile or spark-csv auto-decompresses gzips, but the files I'm working with don't have the .gz extension and therefore end up being read in as compressed. There are millions of files, they're owned by another team and they're updated multiple times a day.
Is there a way to forcibly tell the textFile or the spark-csv API the compression style? Or is there any other way around copying and renaming the files?