0

I have the following problem: suppose that I have a directory containing compressed directories .tar which contain multiple file .csv.gz. I want to get all csv.gz files in the parent compressed directorie *.tar. I work with scala 2.11.7 this tree

   file.tar
       |file1.csv.gz
             file11.csv
       |file2.csv.gz
             file21.csv
       |file3.csv.gz
             file31.csv 

I want to get from file.tar a list of files : file1.csv.gz , file2.csv.gz file3.csv.gz so after that a can create dataframe from each file csv.gz to do some transformation.

Chaouki
  • 446
  • 2
  • 8
  • 20
  • Well, what have you tried? – airudah Mar 27 '18 at 09:03
  • hello @Robert I tried to read from gz files to a dataframe using this [link](https://stackoverflow.com/questions/38635905/reading-in-multiple-files-compressed-in-tar-gz-archive-into-spark) it works but I want to get a list of name of files because i'll read each file into a specific dataframe. – Chaouki Mar 27 '18 at 09:22

0 Answers0