1

I am implementing application specific data import feature from one database to another.

I have a CSV file containing say 10000 rows. These rows need to be inserted/updated into database.

I am using mysql database and inserting from Java.

There might be the case, where couple of rows may present in database that means those need to be updated. If not present in database, those need to be inserted.

One possible solution is that, I can read one by one line, check the entry in database and build insert/update queries accordingly. But this process may take much time to create update/insert queries and execute them in database. Some times my CSV file may have millions of records.

Is there any other faster way to achieve this feature?

VoteCoffee
  • 4,692
  • 1
  • 41
  • 44
Mohammed haneef
  • 127
  • 2
  • 10
  • 1
    For performance you could consider using LOAD DATA INFILE with use of the REPLACE option, see [LOAD DATA INFILE Syntax](http://dev.mysql.com/doc/refman/5.6/en/load-data.html) – VMai Sep 19 '14 at 13:12

4 Answers4

2

I don't know how you determine "is already present", but if it's any kind of database level constraint (probably on a primary key?) you can make use of the REPLACE INTO statement, which will create a record unless it gets an error in which case it'll update the record that prevents it from being inserted.

It works just like INSERT basically:

REPLACE INTO table ( id, field1, field2 )
VALUES ( 1, 'value1', 'value'2 )

If a row with ID 1 exists, it's updated with these values; otherwise it's created.

Erik
  • 1,057
  • 7
  • 11
1

Given that you're using MySQL you could use the INSERT ... ON DUPLICATE KEY UPDATE ... statement, which functions similarly to the SQL standard MERGE statement. MYSQL doc reference here and general Wikipedia reference to SQL MERGE functionality here. The statement would look something like

INSERT INTO MY_TABLE
  (PRIMARY_KEY_COL, COL2, COL3, COL4)
VALUES
  (1, 2, 3, 4)
ON DUPLICATE KEY
  UPDATE COL2 = 2,
         COL3 = 3,
         COL4 = 4

In this example I'm assuming that PRIMARY_KEY_COL is a primary or unique key on MY_TABLE. If the INSERT statement would fail due to a duplicate value on the primary or unique key then the UPDATE clause is executed. Also note (on the MySQL doc page) that there are some gotcha's associated with auto-increment columns on an InnoDB table.

Share and enjoy.

0

Add a stored procedure in SQL for inserting. In the stored procedure use a try catch block to do the insert. If the insert fails do an update. Then you can simply call this method from your program.

Alternatively:

UPDATE Table1 SET (...) WHERE Column1='SomeValue'
IF @@ROWCOUNT=0
    INSERT INTO Table1 VALUES (...)
VoteCoffee
  • 4,692
  • 1
  • 41
  • 44
0

Do you need to do this often or just once in a while? I need to load csv files from time to time to a database for analysis and I created a SSIS-Datasolution with a Data Flow task which loads the csv-File into a table on the SQL Server.

For more infos look at this blog http://blog.sqlauthority.com/2011/05/12/sql-server-import-csv-file-into-database-table-using-ssis/

xeraphim
  • 4,375
  • 9
  • 54
  • 102