1

This may be asked a lot but, so I am trying to insert a 4 million records to a database using java, I did a lot of googling and tried access and MySQL. Both were almost the same, with MySQL I tried statement.addBatch(); but still takes forever. the question is, what is the best time I can get ? and what is the best way ?

        counter++;
        String sqlQuery = "INSERT INTO employees VALUES("some query")";

        sqlState.addBatch(sqlQuery);

        if (counter == 1000) {

            sqlState.executeBatch();

            counter = 0;
        }

Also am I using the Batch right ?

Amir
  • 8,821
  • 7
  • 44
  • 48
Abdullah Asendar
  • 574
  • 1
  • 5
  • 31
  • If you **know** the data is consistent you can probably remove all constraints before inserting and add them when done. This should be a tremendous increase in performance. – Reut Sharabani Sep 01 '15 at 16:58
  • How are the records currently stored? Are they in an SQL file? If so, you should be able to run it as a script directly without the need to use Java. Just a thought. –  Sep 01 '15 at 17:07
  • I'd recommend you to check these: http://stackoverflow.com/q/11389449/1065197, http://stackoverflow.com/q/2993251/1065197 – Luiggi Mendoza Sep 01 '15 at 17:11
  • That sounds like an initial load of data. You might want to insert directly, bypassing Java. For MySQL, that would be using [LOAD DATA](https://dev.mysql.com/doc/refman/5.0/en/loading-tables.html). And as @ReutSharabani said, removing all indexes and foreign keys during the load will help immensely. It's *much* faster to rebuild an index of 4 million records than it is to insert them one at a time. – Andreas Sep 01 '15 at 17:24
  • @ReutSharabani i am not following you, i am new to all this – Abdullah Asendar Sep 01 '15 at 19:39
  • @Welshboy the records are in an XML file, i am parsing and inserting, if i save them in a .txt file after parsing , will that reduce the time ? – Abdullah Asendar Sep 01 '15 at 19:40
  • When you insert a row to an SQL table the database validates the row against aome constraints (primary key, foreign keys ...). The row is also indexed (possibly multiple times. This takes a lot of time and is unnecessary if you **know** the data is valid and you only intend to query it **after** the bulk insert. – Reut Sharabani Sep 02 '15 at 05:13
  • @ReutSharabani i managed to insert 1 billion records in 3 minutes, is that good ? – Abdullah Asendar Sep 02 '15 at 06:44
  • "Good" can only be determined with respect to your use case... – Reut Sharabani Sep 02 '15 at 06:45

1 Answers1

0

Reuse a single PreparedStatements, and set parameters on it for each record, then add it to the batch.

PreparedStatement ps = conn.prepareStatement("Insert into employees (f1, f2, f3) VALUES (?, ?, ?)");
while (hasMoreRecords) {
  ps.setString(1, "v1");
  ps.setString(2, "v2");
  ....
  ps.addBatch();
  if (++i % 1000 == 0) {
    ps.executeBatch();
  }
}
ps.executeBatch();

This won't be a huge difference, but is optimal.

Does your INSERT use a sub-query? Maybe that sub-query is slow.

Sam Barnum
  • 10,559
  • 3
  • 54
  • 60