I have multiple zip files containing two types of files(A.csv & B.csv)
/data/jan.zip --> contains A.csv & B.csv
/data/feb.zip --> contains A.csv & B.csv
I want to read the contents of all the A.csv files inside all the zip files using pyspark.
textFile = sc.textFile("hdfs://<HDFS loc>/data/*.zip")
Can someone tell me how to get the contents of A.csv files into an RDD?