1

I Am trying to read data from zip file

can read whole text file as below

val f = sc.wholeTextFiles("hdfs://")

but don`t know, how to read text data inside zip file

Is there any possible way to do it, if yes please let me know.

sande
  • 567
  • 1
  • 10
  • 24

1 Answers1

3

You can create an RDD from the zipFile with the newAPIHadoopFile command.

import com.cotdp.hadoop.ZipFileInputFormat
import org.apache.hadoop.io.BytesWritable
import org.apache.hadoop.io.Text
import org.apache.hadoop.mapreduce.Job

val zipFileRDD = sc.newAPIHadoopFile(
        "hdfs://tmp/sample_zip/LoanStats3a.csv.zip",
        classOf[ZipFileInputFormat],
        classOf[Text],
        classOf[BytesWritable],
        new Job().getConfiguration())
println("The file contents are: " + zipFileRDD.map(s => new String(s._2.getBytes())).first())
Roy Miller
  • 119
  • 4