2

How long will take insert about 500.000 records from CSV file to MySQL database by Java code? Database hosted on localhost.

Table structure: AI id, | varchar(8) | datetime | int | varchar(2). My code need to insert 70.000 records over 40 minutes. Is there any way to do it faster? Here is the main part of my code:

CsvReader pro

ducts = new CsvReader(path);
products.readHeaders();
stmt = con.createStatement();
String updateString = "INSERT INTO table (T_V1, date, T_V2, T_V3) VALUES (?,?,?,?)";
PreparedStatement preparedStatement = con.prepareStatement(updateString);

            while (products.readRecord()) {
                v1= products.get("V1");
                date = format.parse(products.get("Date") + " " + products.get("Hour"));
                java.sql.Date dateDB = new java.sql.Date(data.getTime());
                v2 = products.get("V2");
                v3 = products.get("V3");



                preparedStatement.setString(1, v1);
                preparedStatement.setDate(2,dateDB);
                preparedStatement.setInt(3, Integer.parseInt(v2));
                preparedStatement.setString(4, v3);   
                preparedStatement.executeUpdate();
            }

According to your advice I moved creation of the statement out of the loop. Now I have 33 records per second, after I had 29 rps.

  • move the creation of the statement out of the while loop – wero Jan 16 '16 at 12:29
  • You should not use Java to import data, use mysqlimport utility. – Roman C Jan 16 '16 at 12:32
  • Once I had to insert millions of records based on `pcap` files to the DB. Partitioning the data into chunks and running it using Threads was way faster. Don't know how it works in MySQL. I used Oracle DB and had prodcuer threads creating table type (representing a set of DB records) and consumer threads creating DB transactions and calling stored procedure with those table type parameters inserting the data to the DB. But generally, if you already have CSV files, the fastest way would be to use some SQL Loader alternative http://dev.mysql.com/doc/refman/5.1/en/load-data.html – AndrewMcCoist Jan 16 '16 at 12:33
  • As suggested, create the `PreparedStatement` only once, outside the loop and additionally, consider using batch inserts (`preparedStatement.addBatch()`) as suggested in [this post](http://stackoverflow.com/questions/3784197/efficient-way-to-do-batch-inserts-with-jdbc). – Mick Mnemonic Jan 16 '16 at 12:40
  • I moved the creation of the statement out of loop. I unfortunately have to do this with Java –  Jan 16 '16 at 12:46
  • Try batching the inserts using 1000 row batches. You should see a significant performance improvement. – Mick Mnemonic Jan 16 '16 at 12:57

5 Answers5

2

I might opt for using the LOAD DATA statement from MySQL instead of using Java:

LOAD DATA LOCAL INFILE '/path/to/your/file.csv' INTO TABLE table;

This would avoid a lot of the overhead you currently have, assuming you are processing each line before inserting it into MySQL.

You can execute a LOAD DATA statement from Java using raw JDBC.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

Instead of create a PreparedStatement inside the while create a PreparedStatement outside and simply set the values inside the while loop.

Something like

    String updateString = "INSERT INTO table (T_V1, date, T_V2, T_V3) VALUES (?,?,?,?)";
    PreparedStatement preparedStatement = con.prepareStatement(updateString);
    while (products.readRecord()) {
            v1= products.get("V1");
            date = format.parse(products.get("Date") + " " + products.get("Hour"));
            java.sql.Date dateDB = new java.sql.Date(data.getTime());
            v2 = products.get("V2");
            v3 = products.get("V3");

            preparedStatement.setString(1, v1);
            preparedStatement.setDate(2,dateDB);
            preparedStatement.setInt(3, Integer.parseInt(v2));
            preparedStatement.setString(4, v3);   
            preparedStatement.executeUpdate();
        }

Additionaly you should commit every a number of rows that can be handled by the memory of your database engine, otherwyse after a certain number of inserts the system slow down very fast.

Note that generally should be possible to create more than 70.000 records in 40 minutes. Probably you have a bottleneck in your network. It is the database local to the java application or is a remote server? If it is a remote server check the connection speed.

Davide Lorenzo MARINO
  • 26,420
  • 4
  • 39
  • 56
0

If it's not necessary to insert code using Java, you can use SQL to insert data.

Use the code below in your GUI tool (SQLyog etc):

LOAD DATA LOCAL INFILE 'D:\\Book1.csv' INTO TABLE table_name FIELDS TERMINATED BY ','    
ENCLOSED BY '"' LINES TERMINATED BY '\r\n' (column_name1, column_name2);
Pang
  • 9,564
  • 146
  • 81
  • 122
Droy
  • 173
  • 2
  • 17
0

you should go for batch insert

    PreparedStatement prepStmt = con.prepareStatement("Insert query");
    prepStmt.setString(1,parameter1);                 
    prepStmt.addBatch();

   // for next set of parameter
    prepStmt.setString(1,parameter2);                        
    prepStmt.addBatch();

    int [] numUpdates=prepStmt.executeBatch()

)

see Which is faster: multiple single INSERTs or one multiple-row INSERT?

How to do a batch insert in MySQL

Community
  • 1
  • 1
M Sach
  • 33,416
  • 76
  • 221
  • 314
-1

First you can create the preparedstatement out of your loop. You can also refactor your code to use multithreading because your insert statements seems not dependent each other, so you can process all the data by splitting it in parallel.

But there is no absolute answer to your question "How long...". It depends on machine where mysql is hosted and the machine where java code is executed: number of core, memory available, etc.

Prim
  • 2,880
  • 2
  • 15
  • 29
  • I moved this out of loop. I know that there is no answer, but I think that it is possible to do it faster, so propably I'm doing it wrong. Multithreading propably is too high level for me:) However, thank you for your reply –  Jan 16 '16 at 12:45