3

I am having trouble streaming a large result-set from a MySQL database using clojure.java.jdbc. This is what I'm trying now:

(defn etl! [query result-set-fn]
  (jdbc/with-db-transaction [t-conn db-spec]
    (let [conn (jdbc/get-connection t-conn)
          statement (jdbc/prepare-statement conn query
                                            :fetch-size Integer/MIN_VALUE
                                            :concurrency :read-only
                                            :result-type :forward-only)]
      (jdbc/query conn [statement] :result-set-fn first))))

I have also tried it without the transaction and with (.setAutoCommit conn False) (which is necessary for Postgres). According to the MySQL docs, setting that fetch-size, concurrency and result-type should tell MySQL to deliver the results one at a time, but that doesn't seem to be happening; the query hangs and heap consumption rises steadily for at least hundreds of megabytes.

A similar question was asked a few years ago, but the answer has now been deprecated by clojure.java.jdbc: Streaming from MySQL with clojure.java.jdbc

Any ideas?

Community
  • 1
  • 1
Logan Buckley
  • 211
  • 1
  • 4
  • I have not played with mysql but have used Postgres recently. Using either :result-set-fn or :row-fn did work lazily for me in that instance. You may wish to try a minimal test with Postgres and see if that solves the problem. Also, for reads, I don't see any benefit to (with-db-transaction ...) – Alan Thompson Jul 05 '15 at 22:21
  • Yes, I do this more or less exactly in postgres at work and it works perfectly well. It may be an issue with the MySQL drivers. The only reason I tried the transaction was that the other answer linked above recommended it, and given the amount of ceremony and incantations that are necessary to ostensibly get MySQL to give you things one at a time (MIN_VALUE, etc) it seemed plausible. I'm using MySQL in this case because I'm working with wikipedia data, but maybe it's worth the hassle to just migrate it over to postgres at this point. – Logan Buckley Jul 06 '15 at 00:08

1 Answers1

0

This is very similar to this question: clojure.java.jdbc lazy query which has several answers, including an up-to-date one from me that refers to the recently added reducible-query. I haven't directly confirmed what combination of options you need to persuade MySQL to stream result sets but :fetch-size and, perhaps, the various cursor settings might be useful. PostgreSQL requires :auto-commit? false as well -- I'm not sure whether you might also need that for MySQL.

Sean Corfield
  • 6,297
  • 22
  • 31