I'm trying to read millions of rows from a database and write to a text file.
This is a continuation of my question database dump to text file with side effects
My problem now seems to be that the logging doesn't happen until the program completes. Another indicator that i'm not processing lazily is that the text file isn't written at all until the program finishes.
Based on an IRC tip it seems my issue is likely having to do with :result-set-fn
and defaulting to doall
in the clojure.java.jdbc/query
area of the code.
I have tried to replace this with a for
function but still discover that memory consumption is high as it pulls the entire result set into memory.
How can i have a :result-set-fn
that doesn't pull everything in like doall
? How can I progressively write the log file as the program is running, rather then dump everything once the -main
execution is finished?
(let [
db-spec local-postgres
sql "select * from public.f_5500_sf "
log-report-interval 1000
fetch-size 100
field-delim "\t"
row-delim "\n"
db-connection (doto ( j/get-connection db-spec) (.setAutoCommit false))
statement (j/prepare-statement db-connection sql :fetch-size fetch-size )
joiner (fn [v] (str (join field-delim v ) row-delim ) )
start (System/currentTimeMillis)
rate-calc (fn [r] (float (/ r (/ ( - (System/currentTimeMillis) start) 100))))
row-count (atom 0)
result-set-fn (fn [rs] (lazy-seq rs))
lazy-results (rest (j/query db-connection [statement] :as-arrays? true :row-fn joiner :result-set-fn result-set-fn))
]; }}}
(.setAutoCommit db-connection false)
(info "Started dbdump session...")
(with-open [^java.io.Writer wrtr (io/writer "output.txt")]
(info "Running query...")
(doseq [row lazy-results]
(.write wrtr row)
))
(info (format "Completed write with %d rows" @row-count))
)