7

I have a query that is basically a select *. In development this table is only 30000 rows, but in production it will much bigger. So I want to consume this query lazily. Why is the query below not lazy? I am using Postgres 9.5.4.1.

(do
  (def pg-uri {:connection-uri "jdbc:postgresql://localhost/..."})
  (def row (atom 0))
  (take 10 (clojure.java.jdbc/query 
          pg-uri
          ["select * from mytable"]
          {:fetch-size 10
           :auto-commit false
           :row-fn (fn [r] (swap! row inc))}))
  @row) ;;=> 300000
Michiel Borkent
  • 34,228
  • 15
  • 86
  • 149
  • Maybe this older S.O. question and answers would be relevant, not sure: http://stackoverflow.com/questions/19728538/clojure-java-jdbc-query-large-resultset-lazily – Josh Sep 29 '16 at 15:32

3 Answers3

9

First, see https://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor.

Solved it like this.

(jdbc/with-db-transaction [tx connection]
  (jdbc/query tx
    [(jdbc/prepare-statement (:connection tx)
                              "select * from mytable"
                              {:fetch-size 10})]
     {:result-set-fn (fn [result-set] ...)}))

where :result-set-fn is a function that consumes the lazy result set.

with-db-transaction takes care of autoCommit set to false. :fetch-size is not passed from query so you have to make a prepare-statement yourself.

Michiel Borkent
  • 34,228
  • 15
  • 86
  • 149
8

clojure.java.jdbc supports lazy processing of large result sets natively these days (the other answers here predate that native support). See the community documentation about it here:

http://clojure-doc.org/articles/ecosystem/java_jdbc/using_sql#processing-a-result-set-lazily

In particular, see the Additional Options? section for database-specific tweaks you might need. You can specify :auto-commit? false on any function that would open a new connection, and you can specify :fetch-size and the various cursor controls on any query-related function. See this StackOverflow question & answer for details of what PostgreSQL might need:

Java JDBC ignores setFetchSize?

Currently, you'll have to dig in the clojure.java.jdbc source or the prepare-statement reference documentation for more of those options. I'm continuing to work on the community documentation to surface all of that information.

Sean Corfield
  • 6,297
  • 22
  • 31
0

You don't need the transaction and prepared-statement stuff above. It is the use of :result-set-fn that causes the lazy-sequence to be consumed. You may have meant to use :row-fn instead.

For full details please see The Clojure Cookbook. A print version is also available, which I recommend.

The jdbc/query function has several optional keyword parameters that control how it constructs the returned result set. The :result-set-fn parameter specifies a function that is applied to the entire result set (a lazy sequence) before it is returned. The default argument is the doall function:

(defn hi-lo [rs] [(first rs) (last rs)])

;; Find the highest- and lowest-cost fruits
(jdbc/query db-spec
            ["select * from fruit order by cost desc"]
            :result-set-fn hi-lo)
;; -> [{:grade nil, :unit nil, :cost 77, :appearance nil, :name "Kumquat"}
;;     {:grade 1.4, :unit nil, :cost 10, :appearance "rotten", :name "Tomato"}]

The :row-fn parameter specifies a function that is applied to each result row as the result is constructed. The default argument is the identity function:

(defn add-tax [row] (assoc row :tax (* 0.08 (row :cost))))

(jdbc/query db-spec
             ["select name,cost from fruit where cost = 12"]
             :row-fn add-tax)
;; -> ({:tax 0.96, :cost 12, :name "Plum"} {:tax 0.96, :cost 12, :name "Fig"})
Alan Thompson
  • 29,276
  • 6
  • 41
  • 48
  • 1
    Unfortunately this didn't work with Postgres. See the link in my answer. You need to set autoCommit to false and the fetchSize in order to make fetching with a cursor work or else the whole result set will still be in memory. Verify by turning on Postgres logs: LOG: execute fetch from S_1/C_2: select * from ... The word 'fetch' here is what you should see with using a cursor. Another check you can do is 'select all' on a big table and record the time vs. a limited query. You'll see that the limited query is much faster, but with fetching it doesn't matter. – Michiel Borkent Sep 30 '16 at 07:37
  • 1
    You'll see that the limited query is much faster, but with using a cursor it doesn't matter => that is, if you only realize for example the first element. – Michiel Borkent Sep 30 '16 at 07:43
  • 1
    So perhaps is it only lazy on the Clojure side, not on the JDBC/Postgres side then... I haven't experimented that far. You may wish to send this question to the clojure Google Group, as the maintainer of clojure.java.jdbc is frequently there. – Alan Thompson Sep 30 '16 at 13:52
  • 1
    Exactly. I already found the answer (http://stackoverflow.com/a/39775018/6264) . There's an open issue for `:fetch-size` not being passed down to prepare-statement in clojure.java.jdbc. – Michiel Borkent Sep 30 '16 at 14:52
  • I've added an answer here that talks about setting `:auto-commit?` and `:fetch-size` and the new (0.7.0+) `reducible-query` function as the best way to deal with this, with links to the recently updated community documentation about `clojure.java.jdbc`. – Sean Corfield May 29 '18 at 16:47