I'm trying to have a function write a database sql dump to text file from a select statement. The volume returned can be very large, and I'm interested in doing this as fast as possible.
With a large result set I also need to log every x-interval the total number of rows written and how many rows per second have been written since last x-interval. I have a (map ) that is actually doing the write during a (with-open ) so i believe the side-effect of logging rows completed should happen there. (See comments in code).
My questions are:
- How do i write "rows-per-second" during the interval and "total rows so far"?
- Is there anything additional I want to keep in mind while writing large jdbc result sets to a file (or named-pipe, bulk loader, etc.) ?
- Does the (doall ) around the (map ) function fetch all results... making it non-lazy and potentially memory intensive?
Would fixed width be possible as an option? I believe that would be faster for a named pipe to bulk loader. The trade-off would be on disk i/o in place of CPU utilization for downstream parsing. However this might require introspection on the result set returned (with .getMetaData?)
(ns metadata.db.table-dump [:use [clojure.pprint] [metadata.db.connections] [metadata.db.metadata] [clojure.string :only (join)] [taoensso.timbre :only (debug info warn error set-config!)] ] [:require [clojure.java.io :as io ] [clojure.java.jdbc :as j ] [clojure.java.jdbc.sql :as sql] ] ) (set-config! [:appenders :spit :enabled?] true) (set-config! [:shared-appender-config :spit-filename] "log.log") (let [ field-delim "\t" row-delim "\n" report-seconds 10 sql "select * from comcast_lineup " joiner (fn [v] (str (join field-delim v ) row-delim ) ) results (rest (j/query local-postgres [sql ] :as-arrays? true :row-fn joiner )) ] (with-open [wrtr (io/writer "test.txt")] (doall (map #(.write wrtr %) ; Somehow in here i want to log with (info ) rows written so ; far, and "rows per second" every 10 seconds. results )) ) (info "Completed write") )