1

I'm currently using the following query to insert into a table only if the record does not already exist, presumably this leads to a table scan. It inserts 28000 records in 10 minutes:

INSERT INTO tblExample(column)
(SELECT ? FROM tblExample WHERE column=? HAVING COUNT(*)=0)

If I change the query to the following, I can insert 98000 records in 10 minutes:

INSERT INTO tblExample(column) VALUES (?)

But it will not be checking whether the record already exists.

Could anyone suggest another way of querying such that my insert speed is faster?

Jamie Eltringham
  • 810
  • 3
  • 16
  • 25
jdie8274j
  • 155
  • 1
  • 10
  • 4
    Do you have a unique index on the column? That would both speed up the lookup query and enforce the constraint (in case you missed it in your code). – Thilo Apr 21 '16 at 09:32
  • Thanks @Thilo. I have taken the primary key off of the table because I have read that having a PK increases the time taken to insert. Are you suggesting that I use a PK and catch the error if a duplicate is found? – jdie8274j Apr 21 '16 at 09:36
  • Having the PK does slow down the insert, but if you are not sure that the data is unique, you need that constraint validation. So yes, put it back on. – Thilo Apr 21 '16 at 09:39
  • 1
    You could try the `derby` equivalent of an `UPSERT`. [Can I use MERGE INTO to simulate “upsert” in Apache Derby?](http://stackoverflow.com/q/34734653/823393) – OldCurmudgeon Apr 21 '16 at 09:40
  • Hi @Thilo. I have tried this, but when I do a batch insert, it then throws a BatchUpdateException because it violates the unique or primary key constraint. I can catch the error, but it will not process any more of the records at that point. – jdie8274j Apr 21 '16 at 12:47

2 Answers2

1

One simple solution (but not recommended) could be to simply have insert statement, catch duplicate key exception and log them. Assuming that the table has unique key constraint.

justAbit
  • 4,226
  • 2
  • 19
  • 34
1

Make sure that you have an index on the column[s] you're checking. In general, have a look at the query execution plan that the database is using - this should tell you where the time is going, and so what to do about it.

For Derby db this is how you get a plan and how to read it.

Derby also has a merge command, which can act as insert-if-not-there. I've not used it myself, so you'd need to test it to see if it's faster for your circumstances.

Bob Salmon
  • 411
  • 4
  • 10