9

I'm trying to rewrite following snippet in clojure, but it all comes out ugly, maybe someone will suggest a more elegant solution?

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

public class ZipFileRdrExp {

  public static void main(String[] args) {

    try {

      FileInputStream fis = new FileInputStream("C:\\MyZip.zip");
      ZipInputStream zis = new ZipInputStream(fis);
      ZipEntry ze;
      while((ze=zis.getNextEntry())!=null){
        System.out.println(ze.getName());
        zis.closeEntry();
      }

      zis.close();

    } catch (FileNotFoundException e) {
      e.printStackTrace();
    } catch (IOException e) {
      e.printStackTrace();
    }
  }
}

Here is my ugly try with duplicate call to getNextEntry:

(ns app.core
  (:import
  (java.io FileInputStream FileNotFoundException IOException File)
  (java.util.zip ZipInputStream ZipEntry)))


(defn- read-zip [zip-file]
  (let [fis (FileInputStream. zip-file)
        zis (ZipInputStream. fis)]
    (loop [ze (.getNextEntry zis)]
      (when ze
        (println (.getName ze))
        (.closeEntry zis)
        (recur (.getNextEntry zis))))
    (.close zis)))
omiel
  • 1,573
  • 13
  • 16
Dfr
  • 4,075
  • 10
  • 40
  • 63
  • 2
    It's a bit unfortunate that your example does not explicitly ask about doing anything with the contents for the `ZipEntry`, only printing the entries name. IMHO the more intricate and error-prone details of the `ZipInputStream` API are how to retrieve the `InputStream` for the entry itself, how to iterate correctly between entries, and finally how to close the `ZipInputStream` properly. It is also unfortunate because all the answers below get away with an easy and less convoluted, but meanwhile a significantly less useful solution (i.e. reading the content). – Daniel Dinnyes Mar 28 '13 at 18:15

5 Answers5

19

I would go with something like the following:

(defn entries [zipfile]
 (lazy-seq
  (if-let [entry (.getNextEntry zipfile)]
   (cons entry (entries zipfile)))))

(defn walkzip [fileName]
 (with-open [z (ZipInputStream. (FileInputStream. fileName))]
  (doseq [e (entries z)]
   (println (.getName e))
   (.closeEntry z))))

EDIT: the above code was eventually tested and corrected.

EDIT: the following works as expected and it's much more concise, even though it uses a different Java API

(defn entries [zipfile]
  (enumeration-seq (.entries zipfile)))

(defn walkzip [fileName]
  (with-open [z (java.util.zip.ZipFile. fileName)]
             (doseq [e (entries z)]
                    (println (.getName e)))))
omiel
  • 1,573
  • 13
  • 16
skuro
  • 13,414
  • 1
  • 48
  • 67
  • 3
    Beware: if `entries` is returned instead of immediately consumed, `with-open` will conflict with the lazy evaluation and close your input stream before the read is finished, triggering an exception. – Brad Koch Jun 04 '14 at 16:10
6

This is a simpler example:

(defn filenames-in-zip [filename]
  (let [z (java.util.zip.ZipFile. filename)] 
    (map #(.getName %) (enumeration-seq (.entries z)))))

This is similar to the code above, but there is no reason to use with-open here. This example returns a sequence of data that you can then print out or better yet, format. It's better to have the function that extracts the data just return data rather than having the side effect of printing contained inside that function.

If you want to print the contents out you can use

(pprint (filenames-in-zip "my.zip"))

and it will give you a nice list.

Brad Koch
  • 19,267
  • 19
  • 110
  • 137
abedra
  • 384
  • 1
  • 4
  • 2
    While you are not required to directly close a ZipFile, it's [strongly advised](http://download.oracle.com/javase/6/docs/api/java/util/zip/ZipFile.html#finalize()) to do that in order to free up resources as soon as you don't need them anymore. – skuro Mar 25 '11 at 06:29
  • Yes, closing the resource is a good idea. The question/example is so simple though that it really doesn't warrant adding the extra macro without explaining what it does. – abedra Mar 25 '11 at 13:19
  • Using [`with-open`](http://clojuredocs.org/clojure_core/clojure.core/with-open) in place of `let` would handle automatic closing of the ZipFile. – Brad Koch Jun 04 '14 at 20:43
  • 1
    @BradKoch But this returns a lazy sequence, so it has the flaws that you describe in your comments to the accepted answer. You may wrap the `map` with `doall` so that the zip file is completely consumed before it is closed, though effectively putting its entire content into memory. – omiel Jun 06 '14 at 16:23
  • @omiel , agreed, one should handle both the file closing and the lazy sequence, a delicate situation to manage. – Brad Koch Jun 06 '14 at 17:37
0

This is similar to skuro's answer that uses ZipInputStream, but a slightly more terse definition of entries.

(defn entries [zip-stream]
  (take-while #(not (nil? %))
              (repeatedly #(.getNextEntry zip-stream))))

(defn walkzip [fileName]
  (with-open [z (ZipInputStream. (FileInputStream. fileName))]
             (doseq [e (entries z)]
                    (println (.getName e))
                    (.closeEntry z))))

Or, if you want to actually extract the files you need another helper function for copying. I've used clojure.java.io for shortening the code, but the same thing could be accomplished without this dependency.

(require '[clojure.java.io :as io])

(defn entries [zip-stream]
  (take-while #(not (nil? %))
              (repeatedly #(.getNextEntry zip-stream))))

(defn copy-file [zip-stream filename]
  (with-open [out-file (file-out-stream filename)]
             (let [buff-size 4096
                             buffer (byte-array buff-size)]
               (loop [len (.read zip-stream buffer)]
                     (when (> len 0)
                       (.write out-file buffer 0 len)
                       (recur (.read zip-stream buffer)))))))

(defn extract-stream [zip-stream to-folder]
  (let [extract-entry (fn [zip-entry]
                          (when (not (.isDirectory zip-entry))
                            (let [to-file (io/file to-folder
                                                   (.getName zip-entry))
                                          parent-file (io/file (.getParent to-file))]
                              (.mkdirs parent-file)
                              (copy-file zip-stream to-file))))]
    (->> zip-stream
      entries
      (map extract-entry)
      dorun)))

This is effectively equivalent to simply unzipping the file with an unzip utility. The beauty of it is that since the entries are in a lazy seq, you can filter or drop or take to your hearts (or requirements) content. Well, I'm pretty sure you can. Haven't really tried it yet :)

Also do note. You MUST process the seq inside of the function where you open the zip stream!!!

Zefira
  • 4,329
  • 3
  • 25
  • 31
  • On the surface this solution looks really sweet, sadly it has a flaw :-( The `entries` function seems to run twice the first time before `copy-file` gets to do its job. `.getNextEntry` moves the position of the pointer in the stream the first file name gets the second files content and the last file will be empty. It took me a while to figure out. Maybe it is how `take-while` / `repeatedly` works together ¯\_(ツ)_/¯ – Jacob Dec 06 '19 at 13:52
  • The flawed behavior can be simulated like this: ``` (defn a [] (take-while #(not= 1 %) (repeatedly #(let [found (rand-int 3)] (println "found" found) found)))) (loop [[b & rest] (a)] (when b (println b) (recur rest))) ``` – Jacob Dec 06 '19 at 14:02
  • 1
    @Jacob thanks for the comments, I can't reproduce the behavior that you're describing however. I'm also unclear as to what your flawed behavior simulation should be outputting versus what it is outputting. i.e. I would be happy to address your concerns with my solution, but as far as I can tell, it does work correctly. – Zefira Dec 18 '19 at 00:20
  • This should give you an idea: https://repl.it/repls/BothPortlyInteger – Jacob Dec 20 '19 at 05:02
  • did my new example clarify the problem for you? You are welcome to contact me directly for more direct communication. – Jacob Dec 30 '19 at 16:19
0

Clojure-Contrib has libraries IO and Jar, which make the code shorter:

(require 'clojure.contrib.jar
         'clojure.contrib.io)

(import [java.util.jar JarFile])

(defn- read-zip [zip-file]
  (clojure.contrib.jar/filenames-in-jar (JarFile. (clojure.contrib.io/file zip-file))))

Caveat: Function filenames-in-jar does not list directory entries in the zip file, only names of actual files.

omiel
  • 1,573
  • 13
  • 16
Leonel
  • 28,541
  • 26
  • 76
  • 103
  • 2
    Pulling in contrib doesn't really give you much here. It doesn't really make the code shorter, and it adds a dependency when you don't really need one. – abedra Apr 08 '11 at 13:03
0

My preferred solution is to create a lazy-seq of [#ZipEntry, #InputStream] from the zip file.

(defn lazy-zip
  "returns a lazy-seq of [entry inputstream] for a zip file

  The zipfile will be closed when the seq is exhausted. All processing has to be done transient through `map` or similar methods."
  [filename]
  (let [zf (java.util.zip.ZipFile. filename)]
    (letfn [(helper [entries]
              (lazy-seq
               (if-let [s (seq entries)]
                 (cons [(first entries)
                        (.getInputStream zf (first entries))]
                       (helper (rest entries)))
                 (do (println "closing zipfile") (.close zf) nil))))]
      (helper (->> zf (.stream) (.toArray))))))

Here is the test that shows the usage:

(deftest test-lazy-zip
  (testing "sample zip is read correctly"
    (is (=
         '(["sample.xml" "<?xml version=\"1.0\" encoding=\"UTF-8\">" "<foo>" "<bar>" "<baz>The baz value</baz>" "</bar>" "</foo>"])
         (map (fn [[entry reader]]
                (into [(.getName entry)]
                      (line-seq (stream-to-buffered-reader reader))))
              (lazy-zip "sample.xml.zip"))))))