I see similar questions with Java/Scala, but how to import files compressed in a zip/gzip/tar format in pyspark, without the actual decompression?
I would like to hear suggestions on 1) how to get a list of files in one compressed file, 2) how to read each one into a spark dataframe using pyspark. The output I look for is a list of filename:dataframe object where the dataframe is the content of each file.
Thanks!