9

I would like to write a method that read several XML files inside a ZIP, from a single InputStream.

The method would open a ZipInputStream, and on each xml file, get the corresponding InputStream, and give it to my XML parser. Here is the skeleton of the method :

private void readZip(InputStream is) throws IOException {

    ZipInputStream zis = new ZipInputStream(is);
    ZipEntry entry = zis.getNextEntry();

    while (entry != null) {

        if (entry.getName().endsWith(".xml")) {

            // READ THE STREAM
        }
        entry = zis.getNextEntry();
    }
}

The problematic part is the "// READ THE STREAM". I have a working solution, which consist to create a ByteArrayInputStream, and feed my parser with it. But it uses a buffer, and for large files I get an OutOfMemoryError. Here is the code, if someone is still interested :

int count;
byte buffer[] = new byte[2048];
ByteArrayOutputStream out = new ByteArrayOutputStream();
while ((count = zis.read(buffer)) != -1) { out.write(buffer, 0, count); }       
InputStream is = new ByteArrayInputStream(out.toByteArray());

The ideal solution would be to feed the parser with the original ZipInputStream. It should works, because it works if I just print the entry content with a Scanner :

Scanner sc = new Scanner(zis);
while (sc.hasNextLine())
{
    System.out.println(sc.nextLine());
}

But... The parser I'm currently using (jdom2, but I also tried with javax.xml.parsers.DocumentBuilderFactory) closes the stream after parsing the data :/ . So I'm unable to get the next entry and continue.

So finally the question is :

  • Does anybody know a DOM parser that doesn't close its stream ?
  • Is there another way to have an InputStream from a ZipEntry ?

Thanks.

Tim Autin
  • 6,043
  • 5
  • 46
  • 76

4 Answers4

7

A small improvement on Tim's solution: The problem with having to call allowToBeClosed() before close() is that it makes closing the ZipInputStream properly when handling exceptions tricky and will break Java 7's try-with-resources statement.

I suggest creating a wrapper class as follows:

public class UncloseableInputStream extends InputStream {
  private final InputStream input;

  public UncloseableInputStream(InputStream input) {
    this.input = input;
  }

  @Override
  public void close() throws IOException {} // do not close the wrapped stream

  @Override
  public int read() throws IOException {
    return input.read();
  }

  // delegate all other InputStream methods as with read above
}

which can then safely be used as follows:

try (ZipInputStream zipIn = new ZipInputStream(...))
{
  DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
  ZipEntry entry;
  while (null != (entry = zipIn.getNextEntry()))
  {
    if ("file.xml".equals(entry.getName())
    {
      Document doc = db.parse(new UncloseableInputStream(zipIn));
    }
  }
}
Tony Abbott
  • 109
  • 1
  • 3
  • But your example class doesn't provide a way to ever close the InputStream. The stream still should be able to be closed, perhaps via a separate method (ex. create forceClose() which delegates to close()). – rtcarlson Jul 31 '14 at 15:47
  • You could add a forceClose() method to UncloseableInputStream, but there is no need because you can just call zipIn.close(). And using zipIn.close() is better because, as in the above example, it plays nicely with try-with-resources. – Tony Abbott Sep 09 '14 at 09:06
4

Thanks to halfbit, I ended up with my own ZipInputStream class, which overrides the close method :

import java.io.IOException;
import java.io.InputStream;
import java.util.zip.ZipInputStream;

public class CustomZipInputStream extends ZipInputStream {

    private boolean _canBeClosed = false;

    public CustomZipInputStream(InputStream is) {
        super(is);
    }

    @Override
    public void close() throws IOException {

        if(_canBeClosed) super.close();
    }

    public void allowToBeClosed() { _canBeClosed = true; }
}
Tim Autin
  • 6,043
  • 5
  • 46
  • 76
3

You could wrap the ZipInputStream and intercept the call to close().

halfbit
  • 3,414
  • 1
  • 20
  • 26
0

If you don't mind external dependencies, Apache Commons IO provides a convenience class named CloseShieldInputStream for blocking the close() call.

private void readZip(InputStream is) throws IOException {

    ZipInputStream zis = new ZipInputStream(is);
    ZipEntry entry = zis.getNextEntry();

    while (entry != null) {

        if (entry.getName().endsWith(".xml")) {
            //commons-io 2.9 and later
            InputStream tempIs = CloseShieldInputStream.wrap(zis);
            //commons-io < 2.9
            //InputStream tempIs = new CloseShieldInputStream(zis);

            // READ THE STREAM

        }
        entry = zis.getNextEntry();
    }
}
jt.
  • 7,625
  • 4
  • 27
  • 24