I have a CSV file with 50000 entries which I want to import in SQL using batch in JDBC.
What should be the optimal batch size for it?
I have a CSV file with 50000 entries which I want to import in SQL using batch in JDBC.
What should be the optimal batch size for it?
According to Oracle official recommendations, optimal batch size is between 50 and 100
Proof: https://docs.oracle.com/cd/E11882_01/java.112/e16548/oraperf.htm#JJDBC28754
Oracle recommends that you use JDBC standard features when possible. This recommendation applies to update batching as well. Oracle update batching is retained primarily for backwards compatibility.
For both standard update batching and Oracle update batching, Oracle recommends you to keep the batch sizes in the general range of 50 to 100. This is because though the drivers support larger batches, they in turn result in a large memory footprint with no corresponding increase in performance. Very large batches usually result in a decline in performance compared to smaller batches.
Have a nice day
50k records is not a large dataset. Bigger batch size will help but if you assume that your database server network round trip is 10 ms:
Batch size 50 => 50,000 rows / 50 batch size * 10 ms latency = 10000 ms latency overhead = 10 sec of latency overhead
Batch size 100 => 50,000 rows / 100 batch size * 10 ms latency = 5000 ms latency overhead = 5 sec of latency overhead
Start by setting a a reasonable batch size for the batch insert statements, and then measure how long it actually takes to insert the rows Remember to vacuum after the bulk insert.
If 50k records take 1 minute to insert you need to focus on optimizing the insertion process and not the JDBC batch size since only fraction of total time is spent in the latency overhead.
For larger data sets you should not be using JDBC. There are tools designed for the bulk insertion task e.g. Oracle has SQL*Loader.