6

I need to make a large amount of inserts, i.e. double digit millions, into an Oracle-DB using JDBC. To do so I'm using something like the following class, inspired by Efficient way to do batch INSERTS with JDBC:

public class Inserter {
    private final int batchSize;
    private final Connection con; // with .setAutoCommit(false)
    private final PreparedStatement ps;
    private int currentSize = 0;

    public Inserter(Connection con, PreparedStatement ps, int batchSize) {
        this.con = con;
        this.ps = ps;
        this.batchSize = batchSize;
    }

    public void addInsert(Object[] vals) throws SQLException {
        ps.clearParameters(); // should be redundant, but better safe than sorry
        for (int i = 0; i < val.length; i++) {
            this.ps.setObject(i + 1, vals[i]);
        }
        ps.addBatch();
        currentSize++;

        if (currentSize >= batchSize) {
            ps.executeBatch();
            currentSize = 0;
        }
    }

    public void flush() {/** to flush leftovers */}
}

While this way of inserting works fine, it is very slow. JDBC batch insert performance describes how basically this exact issue can be dealt with for MySQL, as rewriteBatchedStatements does however not seem to exist on Oracle it is not of much help here.

To improve performance I have also tried to switch the Statement to one big INSERT ALL .../INSERT APPEND ... statement, as per Oracle 11g - most efficient way of inserting multiple rows, which made everything even slower.

My question would thus be if there is any way to optimize these inserts beyond simply using addBatch() and executeBatch()? Or is there maybe some gross, inefficient mistake in the Inserter class above? Any help would be very much appreciated.


More possibly useful information:

  • The tables to be inserted to are partitioned with around one to ten million rows per partition.

  • There is a unique constraint on the tables looking something like this unique(id1, id2, id3, id4) where all columns in it are of type NUMBER and further bound by foreign key constraints to primary keys in other tables.


EDIT:

Following the suggestions from the comments I switched the setObject(index, val) call to:

  1. setInt(index, val), setFloat(index, val), ..., setNull(index, type) calls where apprpropriate

  2. setObject(index, val, type) and setNull(index, typR)

Neither version improved performance significantly.

Furthermore I tried inserting the data into a staging table without any constraints on it, which also did not lead to better performance.

To compare, exporting the data to CSV and loading it in using SQL*Loader lead to a significant performance increase, i.e. ~4.5k => ~50k+ rows per second for the slowest table1.

This leads me to believe the bottleneck to lie with JDBC.

1Alas, the use of SQL*Loader is not a (desired) option in my particular case.

abcalphabet
  • 1,158
  • 3
  • 16
  • 31
  • How large is `batchSize`? – Andreas Jan 30 '19 at 16:58
  • @Andreas I've tried `{1000,10000,25000,100000}` thus far, all were quite slow – abcalphabet Jan 30 '19 at 17:03
  • Have you tried loading your 10+ million records using the SQL*Loader, just to get a baseline idea of the database server performance of doing that many inserts into a table with unique indexes and foreign keys? – Andreas Jan 30 '19 at 17:03
  • @Andreas Yeah, have done that, it's roughly ten times faster with SQL*Loader – abcalphabet Jan 30 '19 at 17:06
  • I've used similar code in the past and successfully inserted about 100,000 rows a second into Oracle on the same machine. How many rows can you inserted in a second? My code didn't use `setObject` but more specific `setInt` and `setString`. And the target was a staging table – initially empty and without any indexes. Try a staging table as well to check if the bottleneck is with JDBC or with the target table. Batch size was 50 if I remember correctly. – Codo Jan 30 '19 at 20:17
  • Do you have `null` values as parameters? If yes Oracle may treat inserts with null parameter at specified index and non-null parameter as different SQLs and will switch query plans. You'd better use `ps.setNull(index, objectType)` to set null values instead of `ps.setObject()` – Ivan Jan 30 '19 at 21:27
  • @Codo I'm only getting around 4k per second... Will try it without `setObject` and using a staging table now – abcalphabet Jan 31 '19 at 07:42
  • @Ivan Yeah, I have null values, a lot even, I'll start using `setNull` for those and report back – abcalphabet Jan 31 '19 at 07:44
  • What is the version of the JDBC driver? (Note the number in the filename is **not** the driver version, it's the intended Java version). In my experience using explicit `setInt()` or `setString()` calls (whatever is valid for the parameter's type) is substantially faster than using `setObject()` for everything with the Oracle driver –  Jan 31 '19 at 13:15
  • @a_horse_with_no_name The driver version is `Oracle 12.1.0.1.0 JDBC 4.1`. Yeah, I would've expected at least some speedup as well, but to my surprise it was negligible at best. – abcalphabet Jan 31 '19 at 13:30
  • ~4.5k rows per second is slow in your opinion ? Please tell what speed would be good for you, 10k rows/sec, 100 k/sec ? Maybe it's not a software problem, but it's a hardware problem. – krokodilko Jan 31 '19 at 18:53
  • Can you upload a .jfr? I'd be happy to take a look at it. – Douglas Surber Feb 04 '19 at 15:07
  • @DouglasSurber Thanks for the offer, sadly I don't have any way to get files out of my development machine/environment. I've found out that there's been some general issues with my test server apparently, hopefully once those are resolved my problem will go away too. – abcalphabet Feb 05 '19 at 11:36

1 Answers1

-1

you never commit the batch. If it is possible add commit after executeBatch if the query is not committed you will create large rollback segments that can slow down the database. Also remove ps.clearParameters() since you always overwrite all parameters or none. Better use Specialized version of setter instead of setObject

SkateScout
  • 815
  • 14
  • 24