1

I can get a text file as String with new String(Files.readAllBytes(Paths.get(path)), StandardCharsets.UTF_8). How do I achieve the same result if the file is in a folder which is in a zip file? I know I can get the zip as a ZipFile and the folder as a ZipEntry but I'm not clear on how I get the file nor how I make a String out of it. I don't want to create any files or folders to get it.

EDIT: Per dpr's answer, here's what I used:

String fileAsString;
try (ZipFile zip = new ZipFile(path)) {
    ZipEntry entry = zip.getEntry("folder/file.txt");
    if (entry == null) entry = zip.getEntry("folder\\file.txt");
    try (InputStream is = zip.getInputStream(entry)) {
        try (Scanner s = new Scanner(is, "UTF-8").useDelimiter("\\A")) {
            fileAsString = s.hasNext() ? s.next() : "";
        }
    }
}
H.v.M.
  • 1,348
  • 3
  • 16
  • 42
  • 4
    You can do something like zipFile.getInputStream(zipEntry) – W-S Nov 27 '16 at 08:56
  • You missed to assign the entry in the `(entry == null)` if-statement. If backward slashes are used, your code will throw a `NullPointerException`. – dpr Nov 28 '16 at 14:46

2 Answers2

1

Technically there is no such thing as directories inside a Zip-file. Everything in a Zip-file is basically an entry (ZipEntry in Java). One can use the isDirectory method to determine, if the current entry is representing a directory of the zipped file system structure or a regular file. The name attribute of a ZipEntry always reflects the full directory hierarchy of the originally zipped file relative to the archive's root. That is for a file Data\Folder1\example.txt you will have 3 ZipEntries in your zip file. One for Data, one Data\Folder1 and one Data\Folder1\example.txt.

By simply iterating over the ZipEntries of your ZipFile and matching the path and file name of your desired file, you should easily find the desired entry. The contents of this entry can than be extracted using the already suggested ZipFile.getInputStream(ZipEntry) method.

See this questions and the answers for examples on how to read an InputStream to string.

Using Apache Commons-IO (IOUtils) for reading the InputStream to string this could look something like this:

public String getFileContentsAsString(final File pZipFile, final String pFileName) throws Exception {

    try (ZipFile zipFile = new ZipFile(pZipFile)) {
        Enumeration<? extends ZipEntry> entries = zipFile.entries();
        while (entries.hasMoreElements()) {
            ZipEntry currentEntry = entries.nextElement();
            if (matchesDesiredFile(pFileName, currentEntry)) {
                try (InputStream entryIn = zipFile.getInputStream(currentEntry)) {
                    String text = IOUtils.toString(entryIn, Charsets.UTF_8);
                    return text;
                }
            }
        }
    }

    return null;
}

private boolean matchesDesiredFile(final String pFileName, final ZipEntry pZipEntry) {
    return !pZipEntry.isDirectory() && pZipEntry.getName().equals(pFileName);
}

If you're simply matching against the name attribute of the entry, you could of course as well use

ZipEntry zipEntry = zipFile.getEntry(filePathWithinZipArchive);

To get the desired entry instead of iterating over the entries "manually".

Note that you should be carefull about the separator character used for directories. As pointed out here, it's up to the application that creates the zip file to either use \ (backslash) or / (forward slash) as directory separator character. I tried this on a Mac using the zip terminal command and both the ZipEntry's name an the original file name were Data/Folder1/example.txt. If you create the zip using a different tool the name of the ZipEntry might be Data\Folder1\example.txt. Even mixed variants (one ZipEntry using forward- and anotherone using backward slashes) are possible. You may want to consider this, if you have no control over the zip creation process.

Community
  • 1
  • 1
dpr
  • 10,591
  • 3
  • 41
  • 71
  • Thanks! I did it in Windows and the `ZipEntry` name was `folder/file.txt`. It's probably platform independent. – H.v.M. Nov 28 '16 at 11:59
  • 1
    It seems to depend on the client used to create the zip file. Not on the platform... Nice clients will use forward slashes while not-so-nice once might use backward slashes (see [this question and answer on SO](http://stackoverflow.com/questions/13846000/file-separators-of-path-name-of-zipentry)). However you will need to support both variants (even mixed ones are possible), if you have no control over the creation process of the zip files you are working on. – dpr Nov 28 '16 at 12:02
0
301    /**
302     * Return zip file entry as string.
303     */
304    public static String zipEntryToString(Path zipFile, String entryName, Charset cs)
305        throws IOException {
306        try (ZipFile zf = new ZipFile(zipFile.toFile())) {
307            ZipEntry ze = zf.getEntry(entryName);
308            if (ze == null) {
309                return null;
310            }
311            try (InputStream is = zf.getInputStream(ze)) {
312                return IOUtils.toString(is, cs);
313            }
314        }
315    }

https://docs.leponceau.org/maven/com.github.jjYBdx4IL.utils/io-utils/latest/apidocs/com/github/jjYBdx4IL/utils/io/ZipUtils.html#zipEntryToString(java.nio.file.Path,java.lang.String,java.nio.charset.Charset)

user1050755
  • 11,218
  • 4
  • 45
  • 56