19

Say we have code like:

File file = new File("zip1.zip");
ZipInputStream zis = new ZipInputStream(new FileInputStream(file));

Let's assume you have a .zip file that contains the following:

  • zip1.zip
    • hello.c
    • world.java
    • folder1
      • foo.c
      • bar.java
    • foobar.c

How would zis.getNextEntry() iterate through that?

Would it return hello.c, world.java, folder1, foobar.c and completely ignore the files in folder1?

Or would it return hello.c, world.java, folder1, foo.c, bar.java, and then foobar.c?

Would it even return folder1 since it's technically a folder and not a file?

Thanks!

Martijn Courteaux
  • 67,591
  • 47
  • 198
  • 287
joshualan
  • 2,030
  • 8
  • 23
  • 32
  • ZipEntry can represent a directory too. See the [isDirectory()](http://docs.oracle.com/javase/6/docs/api/java/util/zip/ZipEntry.html#isDirectory()) method. – Bobulous Aug 02 '12 at 19:16

5 Answers5

30

Well... Lets see:

        ZipInputStream zis = new ZipInputStream(new FileInputStream("C:\\New Folder.zip"));
        try
        {
            ZipEntry temp = null;
            while ( (temp = zis.getNextEntry()) != null ) 
            {
             System.out.println( temp.getName());
            }
        }

Output:

New Folder/

New Folder/folder1/

New Folder/folder1/bar.java

New Folder/folder1/foo.c

New Folder/foobar.c

New Folder/hello.c

New Folder/world.java

Zoop
  • 646
  • 5
  • 7
17

Yes. It will print the folder name too, since it's also an entry within the zip. It will also print in the same order as it is displayed inside the zip. You can use below test to verify your output.

public class TestZipOrder {
    @Test
    public void testZipOrder() throws Exception {
        File file = new File("/Project/test.zip");
        ZipInputStream zis = new ZipInputStream(new FileInputStream(file));
        ZipEntry entry = null;
        while ( (entry = zis.getNextEntry()) != null ) {
         System.out.println( entry.getName());
        }
    }
}
Community
  • 1
  • 1
Manisha Mahawar
  • 627
  • 6
  • 9
  • I am getting a zip file that has pages of document named as `[filename]-[page-number].jpg`. After getting zip file, I will make it read in sequence, and I am expecting to get files from 1 to last page. However, I see sometimes it confuses the order of files. For example, pages are taken in this order: 1, 2, 4, 3, 5, 6, 7, 9, 8, 10. Why do this happen? – efirat Mar 26 '18 at 07:15
  • 1
    Every record in zip has its "offset" from beginning. getNextEntry return items in order based on this offset (starts with offset 0). Some software (like [7zip](https://www.7-zip.org/)) show this offset when you open zip file... – Peter Spireng Jul 13 '18 at 05:10
4

Excerpt from: https://blogs.oracle.com/CoreJavaTechTips/entry/creating_zip_and_jar_files

java.util.zip libraries offer some level of control for the added entries of the ZipOutputStream.

First, the order you add entries to the ZipOutputStream is the order they are physically located in the .zip file.

You can manipulate the enumeration of entries returned back by the entries() method of ZipFile to produce a list in alphabetical or size order, but the entries are still stored in the order they were written to the output stream.

So I would believe that you have to use the entries() method to see the order in which it will be iterated through.

 ZipFile zf = new ZipFile("your file path with file name");
    for (Enumeration<? extends ZipEntry> e = zf.entries();
    e.hasMoreElements();) {
      System.out.println(e.nextElement().getName());
    }
Vikram
  • 4,162
  • 8
  • 43
  • 65
1

The zip file internal directory is a "flat" list of all the files and directories in the zip. getNextEntry will iterate through the list and sequentially identify every file and directory in the zip file.

There is a variant of the zip file format that has no central directory, in which case (if it's handled at all) I suspect you'd iterate through all actual files in the zip, skipping directories (but not skipping files in directories).

Hot Licks
  • 47,103
  • 17
  • 93
  • 151
0

I'd like to explain in more details why Vikram's answer is correct and the accepted and highest scoring answers are potentially misleading or incomplete.

As Vikram stated (correctly)

The order you add entries to the ZipOutputStream is the order they are physically located in the .zip file.

I found this part of the accepted answer to be potentially misleading:

It will also print in the same order as it is displayed inside the zip.

One gotcha I'd like to note is that files within a folder are not guaranteed to be returned in succession.

Different tools display zip file contents in different orders. In particular any tools with a GUI will show all files in a folder together whereas the actual files within the zip they may be scattered throughout.

Let's walk through the two scenarios below.


Scenario 1: Files in Tree Order

Starting with a file tree like this

$ tree
├── folder1
│   ├── bar.java
│   └── foo.c
├── foobar.c
├── hello.c
└── world.java

Now let's create a zip from this directory using the Linux zip command.

$ zip -r zip1.zip ./*
  adding: folder1/ (stored 0%)
  adding: folder1/foo.c (stored 0%)
  adding: folder1/bar.java (stored 0%)
  adding: foobar.c (stored 0%)
  adding: hello.c (stored 0%)
  adding: world.java (stored 0%)

Note the order these files were added. This is the same order that they will be iterated through when using ZipOutputStream. Test that with the following script (borrowed from Zoop's answer)

jshell> import java.util.zip.*;
   ...> ZipInputStream zis = new ZipInputStream(new FileInputStream("./zip1.zip"));
   ...> ZipEntry temp = null;
   ...> while ( (temp = zis.getNextEntry()) != null )
   ...> {
   ...>     System.out.println(temp.getName());
   ...> }
folder1/
folder1/foo.c
folder1/bar.java
foobar.c
hello.c
world.java

Scenario 2: Files Scattered

If we add the files in a different order then they will be iterated through in a different order. Let's try that by using the zip command to create a new file which we add files to 1-by-1:

zip -m zip1.zip ./hello.c
zip -m zip1.zip ./folder1
zip -m zip2.zip ./folder1/foo.c
zip -m zip1.zip ./world.java
zip -m zip1.zip ./foobar.c
zip -m zip1.zip ./folder1/bar.java # Notice we're adding this LAST

Now if we iterate through the files you'll see that folder1/bar.java is the last entry because it was added last. It will not be returned alongside the other folder contents. The directory entry will not be duplicated or immediately precede it.

hello.c
folder1/
folder1/foo.c
world.java
foobar.c
folder1/bar.java

The zip spec is quite complex so be very careful when using ZipInputStream.

Jeremy
  • 68
  • 5