Is there a simple way to unzip a file on disk to a a directory using Clojure? Everything I've found is about unzipping a single file, but I have a zip that has several. I want to unzip it and then read a specific one into memory. Trying to build an ETL tool that downloads a zip file, unzips it, reads a specific file into memory, and then does some stuff to it. Ideally I could use .getNextEntry or something similar and read to memory when it matches a regex
Asked
Active
Viewed 270 times
0
-
Does this answer your question? [Read Content from Files which are inside Zip file](https://stackoverflow.com/questions/15667125/read-content-from-files-which-are-inside-zip-file) – cfrick Jan 28 '20 at 16:35
1 Answers
0
I found unzipping contents of a file surprisingly hard to do in Clojure or Java because you have to create the directory structure yourself. I can imagine that there's ready-made libraries out there, but I ended up using something like the following once:
;; in project.clj or build.boot add dependency
;; [org.apache.commons/commons-compress "1.3"]
(ns com.example.my.namespace
(:require
[clojure.java.io :as io]
[clojure.tools.logging :as log])
(:import
[org.apache.commons.compress.archivers.zip ZipFile ZipArchiveEntry]))
;; Note that make-parents is called for every file. Tested with ZIP
;; with ca 80k files. Not significantly slower than testing here for
;; .isDirectory and only then create the parents. Keeping the nicer
;; code with this comment, then.
(defn- unzip-file [zip-file to-dir]
(log/infof "Extracting %s" zip-file)
(log/debug " to:" to-dir)
(with-open [zipf (ZipFile. (io/file zip-file))]
(doseq [entry (enumeration-seq (.getEntries zipf))
:when (not (.isDirectory ^ZipArchiveEntry entry))
:let [zip-in-s (.getInputStream zipf entry)
out-file (io/file (str to-dir
java.io.File/separator
(.getName entry)))]]
(log/trace " ->" (.getName out-file))
(io/make-parents out-file)
(with-open [entry-o-s (io/output-stream out-file)]
(io/copy zip-in-s entry-o-s)))))
The repeated calls to io/make-parents
didn't make a difference in my tests.
I'm not sure if this is really what you want to use, though. Just reading one file from the Zip would probably be done without extracting everything else. I'd still iterate over the zip-file's contents the same way, though. Just not create the output structure and instead of using io/copy
to stream the contents from inside the zip-file to another file on disc slurp
from the input stream.

Stefan Kamphausen
- 1,615
- 15
- 20