23

I have zip archive that contains a bunch of plain text files in it. I want to parse each text files data. Here's what I've written so far:

try {
    final ZipFile zipFile = new ZipFile(chooser.getSelectedFile());
    final Enumeration<? extends ZipEntry> entries = zipFile.entries();
    ZipInputStream zipInput = null;

    while (entries.hasMoreElements()) {
        final ZipEntry zipEntry = entries.nextElement();
        if (!zipEntry.isDirectory()) {
            final String fileName = zipEntry.getName();
            if (fileName.endsWith(".txt")) {
                zipInput = new ZipInputStream(new FileInputStream(fileName));
                final RandomAccessFile rf = new RandomAccessFile(fileName, "r");
                String line;
                while((line = rf.readLine()) != null) {
                    System.out.println(line);
                }
                rf.close();
                zipInput.closeEntry();
            }
        }
    }
    zipFile.close();
}
catch (final IOException ioe) {
    System.err.println("Unhandled exception:");
    ioe.printStackTrace();
    return;
}

Do I need a RandomAccessFile to do this? I'm lost at the point where I have the ZipInputStream.

Amir Afghani
  • 37,814
  • 16
  • 84
  • 124

1 Answers1

43

No, you don't need a RandomAccessFile. First get an InputStream with the data for this zip file entry:

InputStream input = zipFile.getInputStream(entry);

Then wrap it in an InputStreamReader (to decode from binary to text) and a BufferedReader (to read a line at a time):

BufferedReader br = new BufferedReader(new InputStreamReader(input, "UTF-8"));

Then read lines from it as normal. Wrap all the appropriate bits in try/finally blocks as usual, too, to close all resources.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Hi Jon, I am looking for the similar solution, only difference is I would be reading text from different formats and using Tika to read these files. any pointers? http://stackoverflow.com/questions/15667125/read-content-from-files-which-are-inside-zip-file – S Jagdeesh Mar 28 '13 at 06:44
  • @SJagdeesh: I don't know what Tika even is. I'll look at your question, but I don't know whether I'll be able to help. – Jon Skeet Mar 28 '13 at 06:45
  • What if there are multiple files and I also want their filenames? – bonapart3 Jan 22 '21 at 21:13
  • @bonapart3: Then you would use `entry.getName()`, as the code in the question already does. – Jon Skeet Jan 23 '21 at 07:30