Read SQL Database in batches

Question

I am using Java to read from a SQL RDBMS and return the results to the user. The problem is that the database table has 155 Million rows, which make the wait time really long.

I wanted to know if it is possible to retrieve results as they come from the database and present them incrementaly to the user (in batches).

My query is a simple SELECT * FROM Table_Name query.

Is there a mechanism or technology that can give me callbacks of DB records, in batches until the SELECT query finishes?

The RDBMS that is used is MS SQL Server 2008.

Thanks in advance.

are you trying to return all 155 million rows to the end user at once? — Woot4Moo, May 23 '13 at 14:18
depending on the RDBMS select only limited number of rows for, MySql use `Limit` for Sql Server use `TOP n` — Alya'a Gamal, May 23 '13 at 14:20
+1 above comment. What can a user meaningfully do with 155m rows? Also, which database? JDBC "supports" setting the number of rows to fetch in [Statement](http://docs.oracle.com/javase/6/docs/api/java/sql/Statement.html#setFetchSize(int)) and [ResultSet](http://docs.oracle.com/javase/6/docs/api/java/sql/ResultSet.html#setFetchSize(int)). Unfortunately, these aren't necessarily implemented depending on your JDBC vendor, — wmorrison365, May 23 '13 at 14:25
@wmorrison365 the application is doing data mining, thus i need all 155M rows. Further operations on the rows are happening later (e.g. pivot) — nikkatsa, May 23 '13 at 14:51
My only suggestion is to find a way to implement most of the functionality on the server. RDBMS are designed (and, hopefully, optimized)) for such tasks. If it's more than a single query, then consider stored procedures. — PM 77-1, May 23 '13 at 15:24

score 1 · Answer 1 · edited May 23 '17 at 11:50

Methods Statement#setFetchSize and Statement#getMoreResults are supposed to allow you to manage incremental fetches from the database. Unfortunately, this is the interface spec and vendors may or may not implement these. Memory management during a fetch is really down to the vendor (which is why I wouldn't strictly say that "JDBC just works like this").

From the JDBC documentation on Statement :

setFetchSize(int rows)

Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed for ResultSet objects genrated by this Statement.

getMoreResults()

Moves to this Statement object's next result, returns true if it is a ResultSet object, and implicitly closes any current ResultSet object(s) obtained with the method getResultSet.

getMoreResults(int current)

Moves to this Statement object's next result, deals with any current ResultSet object(s) according to the instructions specified by the given flag, and returns true if the next result is a ResultSet object. current param indicates Keep or close current ResultSet?

Also, this SO response answers about the use of setFetchSize with regards to SQLServer 2005 and how it doesn't seem to manage batched fetches. The recommendation is to test this using the 2008 driver or moreover, to use the jTDS driver (which gets thumbs up in the comments)

This response to the same SO post may also be useful as it contains a link to SQLServer driver settings on MSDN.

There's also some good info on the MS technet website but relating more to SQLServer 2005. Couldn't find the 2008 specific version in my cursory review. Anyway, it recommends creating the Statement with:

com.microsoft.sqlserver.jdbc.SQLServerResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY (2004) scrollability for forward-only, read-only access, and then use the setFetchSize method to tune performance

score 0 · Answer 2 · answered May 23 '13 at 14:22

0

Using pagination (LIMIT pageno, rows / TOP) might create holes and duplicates, but might be used in combination with checking the last row ID (WHERE id > ? ORDER BY id LIMIT 0, 100).

You may use TYPE_FORWARD_ONLY or FETCH_FORWARD_ONLY.

answered May 23 '13 at 14:22

Joop Eggen

107,315
7
83
138

score 0 · Answer 3 · answered May 23 '13 at 15:03

This is exactly how is JDBC driver supposed to work (I remember the bug in old PostgreSQL driver, that caused all fetched records to be stored in memory).

However, it enables you to read record when the query starts to fetch them. This is where I would start to search.

For example, Oracle optimizes SELECT * queries for fetching the whole set. It means it can take a lot of time before first results will appear. You can give hints to optimize for fetching first results, so you can show first rows to your user quite fast, but the whole query can take longer to execute.

You should test your query on console first, to check when it starts to fetch results. Then try with JDBC and monitor the memory usage while you iterate through ResultSet. If the memory usage grows fast, check if you have opened ResultSet in forward-only and read-only mode, if necessary update driver.

If such solution is not feasible because of memory usage, you can still use cursors manually and fetch N rows (say, 100) in each query.

Cursor documentation for MSSQL: for example here: http://msdn.microsoft.com/en-us/library/ms180152.aspx

Thanks for the answer. This MS cursor and fetch functionality should do the job i guess. I ll give that a try and post any results. — nikkatsa, May 23 '13 at 15:29

Read SQL Database in batches

3 Answers3