Get .tsv file from an archive in java without unzipping the archive

Question

I have an archive _2016_08_17.zip that contains 8 .tsv files. I need to extract the file named hit_data.tsv and upload it to bigquery. The files are in a bucket on the google cloud platform.

Can someone give me a simple program that opens the archive, finds the correct file and then prints its rows to screen. I can take it from there. My idea is to replace the path gs://path_name/*hit_data.tsv with the buffer that contains the hit_data.tsv data.

    public static void main(String[] args) {
    Pipeline p = DataflowUtils.createFromArgs(args);

    p
            .apply(TextIO.Read.from("gs://path_name/*hit_data.tsv"))  
             \\.apply(Sample.<String>any(10))  
            .apply(ParDo.named("ExtractRows").of(new ExtractRows('\t', "InformationDateID")))
            .apply(BigQueryIO.Write
                    .named("BQWrite")
                    .to(BigQuery.getTableReference("ddm_now_apps", true))
                    .withSchema(getSchema())
                    .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
                    .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));

    p.run();
}

score 0 · Answer 1 · answered Aug 17 '16 at 13:59

0

By definition, you can't read a file from a zip archive without unzipping it.

answered Aug 17 '16 at 13:59

GreyBeardedGeek

29,460
2
47
67

Perhaps, but I don't want to access the hard drive more than I need to. I can certainly access the file without saving the unzipped file on the hard drive. – Daniel Lee Aug 17 '16 at 14:04
sure, but that's not what you asked - you should update your question to clarify – GreyBeardedGeek Aug 17 '16 at 14:16
I tried to ask it in the way that makes the most sense. I think that you know what I mean. – Daniel Lee Aug 17 '16 at 14:18

score 0 · Answer 2 · answered Aug 17 '16 at 14:21

0

We have ZipFile class. It has entries method that returns enumeration of entries. Now we can find entry or use getEntry method if we know name and path to file in zip.

Then, last step, we can use getInputStream method to read only entry that we want.

answered Aug 17 '16 at 14:21

Koziołek

2,791
1
28
48

Get .tsv file from an archive in java without unzipping the archive

2 Answers2