How to run 2 SQL "SELECT"s atomically? Or any other better way to get number of rows before i process them

Question

I'm trying to run atomically

ResultSet resSet;
resSet = statement.executeQuery("SELECT COUNT(*) FROM table");
resSet.next()
long rowCount = resSet.getLong(1);
resSet = statement.executeQuery("SELECT * FROM table");
// read data of known row count...

My question is what is the best way?

Currently I found out I can do:

connection.setAutoCommit(false);
connection.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE)
// call 2 SQL queries above
connection.commit();

This way seems to work. I tested that another thread is blocked to perform INSERT in between first SELECT and commit(). Is that correct and optimal way? Can I be sure that this way my COUNT will always be same as rows count returned from next select?

Also I would expect that instead of Connection.TRANSACTION_SERIALIZABLE the Connection.TRANSACTION_REPEATABLE_READ shall be sufficient. But it does not not work in Derby 10.11.1.1. Is it a bug? I'm new to database business but it works as expected for H2 database - therefore I expect it might be a derby bug...

Note that I already know about solution where you can do:

statement = conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, 
                                 ResultSet.CONCUR_READ_ONLY);
resultSet = statement.executeQuery("SELECT * FROM table");
if (ResultSet.last()) {
    int rowCount = ResultSet.getRow();
    ResultSet.beforeFirst(); 
}
while(ResultSet.next()){...}

But this solution is not optimal. For derby I measured it is ~7 times slower. For H2 it is ~2 times slower.

And why not simply execute the second query, and increment a counter for each read row. — JB Nizet, May 20 '15 at 20:53
also note funny thing H2 database will print result correctly when `connection.commit()` is right after `executeQuery()` before `resultSet.next()`. But Derby database would give you modified data. Thus in Derby to make it work atomically you need to put `commit()` after you read out data from `resultSet` — Vit Bernatik, May 20 '15 at 20:58
@JB Nizent - Sorry I should have mention that. My intention is know row count before. I'm converting big database and I want to provide progress bar. For that I need to know before the row count. — Vit Bernatik, May 20 '15 at 21:00
Frankly, I wouldn't care much about isolation for such a use case. Imagine you're unlucky and a row is added between the count query and the second query. So what? Your progress bar will be at 100% instead of 99.999% when processing the n-1th element. Is that really a problem? — JB Nizet, May 20 '15 at 21:03
well actually you are right... But I'm already too interested in exact best solution. Call me perfectionist... — Vit Bernatik, May 20 '15 at 21:31
@VitBernatik you can initialize the connection with those parameters for all your app when defining the url connection, so there's no need to call `Connection#setAutocommit` nor `Connection#setTransactionIsolation` at all and all the connections will be created like this. Also, it would be better to call the second query only and use a counter variable to check how many rows you got instead of executing the (same heavy) query twice. — Luiggi Mendoza, May 21 '15 at 03:01
@Luggi Mendoza: Actually for H2 `SELECT COUNT(*)` is not a heavy command it takes for 200K rows only 0.2ms. For Derby it takes 220ms so another reason to go with H2. As I wrote above I want to know number of rows before I process them (in my case to provide progress bar). Thx for tip in URL. — Vit Bernatik, May 21 '15 at 07:38
It is highly DB (version, settings, driver) specific how well the serialisation works and how inefficient it is. I would try to avoid it at all costs. — eckes, May 21 '15 at 23:35
Well thx for hint. But can you give tested combination where it is actually slower for big tables than other approaches? Also for my example and H2 database it seems sufficient to use only `Connection.TRANSACTION_REPEATABLE_READ`. Do you feel we shall avoid that as well? Also we can only enable it just before those 2 `SELECT`s, and then disable it again. — Vit Bernatik, May 22 '15 at 10:45

score 0 · Answer 1 · answered May 20 '15 at 21:13

0

how about

SELECT (SELECT COUNT(*) FROM table) c, * FROM table

answered May 20 '15 at 21:13

Wow thx. But I'm not sure if it is not too big overhead. Imagine I have 1M lines and for each I would have copied this count, whereas I need the count only once. – Vit Bernatik May 20 '15 at 21:40
Also it works for H2, but Derby does not understand such SQL statement. It complains about second asterisk `java.sql.SQLSyntaxErrorException: Syntax error: Encountered "*" at line 1, column 43.` – Vit Bernatik May 20 '15 at 21:49
1

You should enumarate all column names instead of "*". Inner SELECT is a constant value in outer SELECT, so it will be evaluated once, I think. But it depends on implementation. – May 20 '15 at 22:09
Yup when I explicitly list columns like this `SELECT (SELECT COUNT(*) FROM table) c, ID, VAR1 FROM table` it works also for Derby DB. – Vit Bernatik May 21 '15 at 08:03
So I did test this approach and it seems intriguing. It broadered my SQL knowledge. But it is NOT faster. My tests on 200K rows (4 columns). Shows that 2 consecutive read is faster than combined select for embedded database. In case of Derby I save 15% of time (~109ms). In case of H2 I save 35% of time (652ms). Total read times (for faster consecutive read version) are 777ms for Derby and 1839 ms for H2 database. (Averaged out of 5 measurements) – Vit Bernatik May 21 '15 at 12:15

How to run 2 SQL "SELECT"s atomically? Or any other better way to get number of rows before i process them

1 Answers1

Linked