9

I have a code like this below,

try (Connection connection = this.getDataSource().getConnection();
         PreparedStatement statement = connection.prepareStatement(sqlQuery);) {


        try {
            statement.setFetchSize(10000); // Set fetch size
            resultSet = statement.executeQuery();

            while (true) {
                resultSet.setFetchSize(10000);
                boolean more = resultSet.next();
                if (! more) {
                    break;
                }
                // populating an arraylist from the value from resultSet
            }
        }
        catch (Exception e) {
            LOGGER.error("Exception : "+e);
        }
    } catch (SQLException e) {
        LOGGER.error("Exception : "+e);
    }

My understanding is as follows,

The statement fetch size is 10000. when statement.executeQuery() is executed, it returns the ResultSet cursor. It will have 10000 rows in memory. When resultSet.next is called, it gets one row from the memory buffer. (one row per call). When no more rows in memory, the query is fired again and 10000 rows are fetched from database again and stored in buffer. This continues until there is no rows to be fetched from DB

So if my understanding is correct, how many actual DB calls will be there for a total rows of 210000? Is it 21 ? (210000 / 10000)

Also when and who calls the DB (when rows in buffer are all read) to get more rows (10000, in my case) and store in buffer. Also when is the buffer cleared ?

Please correct me if I am wrong in my understanding.

I need to work with millions of data in Oracle Database.

Thanks for any pointers/info

Regards,

SD

user184794
  • 1,036
  • 9
  • 15
  • The `setFetchSize#` method is supposed to be a hint. It is totally dependent on the JDBC driver to either use or ignore it. Also `who calls the DB` --> the driver does that. Got most of it from this [SO Q/A](http://stackoverflow.com/questions/1318354/what-does-statement-setfetchsizensize-method-really-do-in-sql-server-jdbc-driv) – boxed__l Feb 21 '17 at 21:59
  • In case of OCI driver, there is section in documentation which describes the amount of network round-trips per each OCI call. The JDBC driver is not documented this way - it provides higher level API. But you can use the OCI doc as introduction into the topic. – ibre5041 Feb 22 '17 at 07:37

2 Answers2

6

Sorry, but your understanding is wrong. There is no such thing as "query is fired again".

The execution of the query is done once. This will take an initial amount of time to process the query, (which you can't do anything about other than optimizing your query,) and then it will begin producing rows on the server, which need to be transferred to the client. While the rows are being transferred, the server will probably be continuing to generate more rows to be transferred, and buffering them on the server. This server-side buffering is totally unrelated to the kind of buffering that we are talking about in this Q&A, and you have very little control over it. (Perhaps by means of server configuration, if at all.) At some point all rows will have been collected on the server, and then the only remaining thing to do will be to transfer the remaining rows from the server to the client.

So, as far as the client can tell, once it has sent the query to the server, there is a certain delay while the server is thinking about it, after which rows are becoming available at a rate which is usually as fast as the wire can carry them. So, the client starts reading these rows with resultSet.next().

Without any buffering, each call to resultSet.next() would send a request from the client to the server, telling it to send the next row, and the server would respond with just that row. That would yield the first row very quickly, but it would be very inefficient in the long run, because it would be causing too many round-trips between the client and the server.

With buffering, the first call to resultSet.next() will request a bunch of rows from the server. This will impose a penalty on the time to receive the first row, because you are going to have to wait for 100 rows to be sent over the wire, but in the long run it will significantly reduce total network overhead, because there will be only one round-trip between the client and the server per bunch-of-rows.

The ideal strategy for resultSet.setFetchSize() is to leave it as it is and not worry too much about it.

But if you are paranoid about performance, then a good strategy would be to begin with a fairly small fetch size, (say 10,) so as to get your first row quickly, and then keep doubling it until it reaches a certain maximum (say 100,) beyond which there is really no improvement.

Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
  • Your last paragraph seems to suggest that we can change the fetch size for an open ResultSet as we process it. Is that what you meant? – Gord Thompson Feb 22 '17 at 00:44
  • @GordThompson Well, that's what I meant. I just assume it works because it is not failing when the OP is using it. I may of course be wrong. I would assume it depends on the RDBMS implementation, but I have not looked at the standard. But something tells me your question is rhetorical. Why don't you just state what you know, so I may fix that last paragraph? – Mike Nakis Feb 22 '17 at 05:44
  • @GordThompson Changing the fetch size for an open cursor is allowed. Whether or not the JDBC driver supports it or just ignores it, is implementation-defined. – Mark Rotteveel Feb 22 '17 at 12:17
  • Thanks, @Mark. I asked because I was genuinely curious; I'd never seen that done before. – Gord Thompson Feb 22 '17 at 12:22
3

The only people who can reply to your question are the authors of the Oracle JDBC driver.

That being said a call to db to read the next chunk of data won't take more then a few ms (or less), the bulk of the time will depend on the transfer rate, and possibly how you get data from the resultset.

I think that once you go above a few hundred record per call you are into diminishing return setting a bigger fetch size.

About clearing the buffer, that's mostly garbage collection domain, once you loose reference to the resultset.

Just make sure your statement is FORWARD ONLY, both for performance reason and memory footprint.

connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,ResultSet.CONCUR_READ_ONLY );

minus
  • 2,646
  • 15
  • 18
  • useful link: [Oracle JDBC Documentation](https://docs.oracle.com/cd/E11882_01/java.112/e16548/resltset.htm#JJDBC28621) – Izruo Feb 21 '17 at 22:06
  • A fetch size of >1K probably does not make that much difference. If you want to see what is going on from the database side, you can enable SQL Tracing – BobC Feb 21 '17 at 22:07