I need to write a function that splits records into separate files based on the value of a field. E.g. given the input:
[
["Paul" "Smith" 35]
["Jason" "Nielsen" 39]
["Charles" "Brown" 22]
]
We end up with a file "Paul"
, containing "Paul Smith 35"
, file "Jason"
, containing "Jason Nielsen 39"
, etc.
I don't know the names in advance, so I need to keep references for the writers, so that I can close them in the end.
The best I could come up with was using a ref to keep the writers, like this:
(defn write-split [records]
(let [out-dir (io/file "/tmp/test/")
open-files (ref {})]
(try
(.mkdirs out-dir)
(dorun
(for [[fst lst age :as rec] records]
(binding [*out* (or
(@open-files fst)
(dosync
(alter open-files assoc fst (io/writer (str out-dir "/" fst)))
(@open-files fst)))]
(println (apply str (interpose " " rec))))))
(finally (dorun (map #(.close %) (vals @open-files)))))))
This works, but feels horrible and, more importantly, runs out of heap, even though I only have five output files, which are open at the very beginning. Seems like something is being retained somehow...
Can anyone think of a more functional and Clojure-like solution?
EDIT: The input is big. Potentially gigabytes of data, hence the importance of memory efficiency, and the reluctance to close the files after every write.