3

Based on my reading, I see that the way to stream a ResultSet in MySQL using the MySQL JDBC driver is these two commands:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

My question is could an expert clarify if streaming the ResultSet using above code returns one row to client, then go to server to fetch next row and so on (terribly inefficient) or whether it is smart enough to do buffered streaming like a BufferedStreamReader? If it does buffered streaming, how to set the buffer size?

EDIT: From the doc:

The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this, any result sets created with the statement will be retrieved row-by-row.

Does this mean that if I have 10M rows then there are 10M roundtrips to the server to get these rows? This is terribly inefficient. How can I stream the ResultSet but have it buffered so that I don't have to make so many roundtrips?

EDIT2: It seems MySQL does some buffering automatically when fetchSize is set to Integer.MIN_VALUE. In my test I was able to read more than 40M rows in less than 20 minutes using setFetchSize(Integer.MIN_VALUE). This translates to about 30,000 rows per second. I don't know how big average row was but its hard to imagine 30,000 roundtrips per second.

Also a separate question: what does MySQL do if the result set has more elements than the fetchSize? e.g., result set has 10M rows and fetchSize is set to 1000. What happens then?

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
morpheus
  • 18,676
  • 24
  • 96
  • 159
  • 1
    Please check this http://stackoverflow.com/questions/24704882/setting-fetch-size-to-negative-number – Acewin Feb 11 '16 at 18:33
  • 1
    another interesting read http://stackoverflow.com/questions/25019410/confusion-with-setfetchsize-method-of-statement-object – Acewin Feb 11 '16 at 18:36
  • fetchsize defines the buffer size for streaming. and yes in this case the buffer size is 1 – Acewin Feb 11 '16 at 19:16
  • 1
    please re-read my question and see if you can make out the difference between what is being asked here and elsewhere. – morpheus Feb 11 '16 at 19:44
  • yes words are different but question is not. – Acewin Feb 11 '16 at 19:45
  • if streaming the ResultSet using above code returns one row to client, then go to server to fetch next row and so on (terribly inefficient). Yes it fetches each row at a time. – Acewin Feb 11 '16 at 19:46
  • Complete result set is streamed. So yes in a way it is buffered Stream. and the buffersize to fetch is 1 – Acewin Feb 11 '16 at 19:47
  • If you are working with ResultSets that have a large number of rows or large values and cannot allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time. – Acewin Feb 11 '16 at 19:50
  • this definitely is not the most efficient way to fetch a resultset. – Acewin Feb 11 '16 at 19:51

1 Answers1

4

It seems MySQL does some buffering automatically when fetchSize is set to Integer.MIN_VALUE.

It does, at least sometimes. I tested the behaviour of MySQL Connector/J version 5.1.37 using Wireshark. For the table ...

CREATE TABLE lorem (
    id INT AUTO_INCREMENT PRIMARY KEY,
    tag VARCHAR(7),
    text1 VARCHAR(255),
    text2 VARCHAR(255)
    )

... with test data ...

 id  tag      text1            text2
---  -------  ---------------  ---------------
  0  row_000  Lorem ipsum ...  Lorem ipsum ...
  1  row_001  Lorem ipsum ...  Lorem ipsum ...
  2  row_002  Lorem ipsum ...  Lorem ipsum ...
...
999  row_999  Lorem ipsum ...  Lorem ipsum ...

(where both `text1` and `text2` actually contain 255 characters in each row)

... and the code ...

try (Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY)) {
    s.setFetchSize(Integer.MIN_VALUE);
    String sql = "SELECT * FROM lorem ORDER BY id";
    try (ResultSet rs = s.executeQuery(sql)) {

... immediately after the s.executeQuery(sql) – i.e., before rs.next() is even called – MySQL Connector/J had retrieved the first ~140 rows from the table.

In fact, when querying just the tag column

    String sql = "SELECT tag FROM lorem ORDER BY id";

MySQL Connector/J immediately retrieved all 1000 rows as shown by the Wireshark list of network frames:

framelist.png

Frame 19, which sent the query to the server, looked like this:

frame19.png

The MySQL server responded with frame 20, which started with ...

frame20.png

... and was immediately followed by frame 21, which began with ...

frame21.png

... and so on until the server had sent frame 32, which ended with

frame32.png

Since the only difference was the amount of information being returned for each row, we can conclude that MySQL Connector/J decides on an appropriate buffer size based on the maximum length of each returned row and the amount of free memory available.

what does MySQL do if the result set has more elements than the fetchSize? e.g., result set has 10M rows and fetchSize is set to 1000. What happens then?

MySQL Connector/J initially retrieves the first fetchSize group of rows, then as rs.next() moves through them it will eventually retrieve the next group of rows. That is true even for setFetchSize(1) which, incidentally, is the way to really get only one row at a time.

(Note that setFetchSize(n) for n>0 requires useCursorFetch=true in the connection URL. That is apparently not required for setFetchSize(Integer.MIN_VALUE).)

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
  • 1
    Interesting that this doesn't seem to match with what is described in the documentation (and that they even neglect to mention `useCursorFetch`) on http://dev.mysql.com/doc/connector-j/en/connector-j-reference-implementation-notes.html . It does makes me wonder if `useCursorFetch` has the same caveats as with using `Integer.MIN_VALUE`. – Mark Rotteveel Feb 15 '16 at 15:18
  • are you able to share details on how you tested the behavior using Wireshark? – morpheus Feb 16 '16 at 19:49
  • May I please ask which version/s of the driver was this deep research/testing performed against? – matanster Mar 09 '16 at 09:31
  • 1
    @matanster The tests were performed using MySQL Connector/J version 5.1.37. – Gord Thompson Mar 09 '16 at 12:17