Let us assume I have a tar.gz archive with 7 csv files in it. How to manipulate such a tar.gz archive to get each csv file in a separate RDD or DataFrame.
I have tried the possibility mentioned here but I get all of the 7 csv files in one RDD, which is also the same as doing a simple sc.textFile()
.
I am using Spark 2.*