I need to make a large amount of inserts, i.e. double digit millions, into an Oracle-DB using JDBC. To do so I'm using something like the following class, inspired by Efficient way to do batch INSERTS with JDBC:
public class Inserter {
private final int batchSize;
private final Connection con; // with .setAutoCommit(false)
private final PreparedStatement ps;
private int currentSize = 0;
public Inserter(Connection con, PreparedStatement ps, int batchSize) {
this.con = con;
this.ps = ps;
this.batchSize = batchSize;
}
public void addInsert(Object[] vals) throws SQLException {
ps.clearParameters(); // should be redundant, but better safe than sorry
for (int i = 0; i < val.length; i++) {
this.ps.setObject(i + 1, vals[i]);
}
ps.addBatch();
currentSize++;
if (currentSize >= batchSize) {
ps.executeBatch();
currentSize = 0;
}
}
public void flush() {/** to flush leftovers */}
}
While this way of inserting works fine, it is very slow. JDBC batch insert performance describes how basically this exact issue can be dealt with for MySQL, as rewriteBatchedStatements
does however not seem to exist on Oracle it is not of much help here.
To improve performance I have also tried to switch the Statement to one big INSERT ALL ...
/INSERT APPEND ...
statement, as per Oracle 11g - most efficient way of inserting multiple rows, which made everything even slower.
My question would thus be if there is any way to optimize these inserts beyond simply using addBatch()
and executeBatch()
? Or is there maybe some gross, inefficient mistake in the Inserter
class above? Any help would be very much appreciated.
More possibly useful information:
The tables to be inserted to are partitioned with around one to ten million rows per partition.
There is a unique constraint on the tables looking something like this
unique(id1, id2, id3, id4)
where all columns in it are of typeNUMBER
and further bound by foreign key constraints to primary keys in other tables.
EDIT:
Following the suggestions from the comments I switched the setObject(index, val)
call to:
setInt(index, val)
,setFloat(index, val)
,...
,setNull(index, type)
calls where apprpropriatesetObject(index, val, type)
andsetNull(index, typR)
Neither version improved performance significantly.
Furthermore I tried inserting the data into a staging table without any constraints on it, which also did not lead to better performance.
To compare, exporting the data to CSV and loading it in using SQL*Loader lead to a significant performance increase, i.e. ~4.5k => ~50k+ rows per second for the slowest table1.
This leads me to believe the bottleneck to lie with JDBC.
1Alas, the use of SQL*Loader is not a (desired) option in my particular case.