2

I have a scenario where the client has daily updates / inserts from a CSV file which is generally around 30,000 to 50,000 rows. This CSV will contain both new imports as well as need the ability to update existing rows.

I am using PHP / MySQL.

Currently I am having to check each row of the CSV and compare with what is in the database.

This is obviously taking a really long time.

If there is nothing in the database it does a straight import which is fast in itself.

I have thought about the temp table and join, but did not know how that would work with the new rows compared to the update.

Clintre
  • 23
  • 1
  • 4
  • 5
    Looks like you're in need of `ON DUPLICATE KEY UPDATE` – Mike Nov 07 '12 at 23:54
  • @Mike what if you need to validate input and also the CSV does not contain a KEY ??? using autoincrement – Baba Nov 08 '12 at 00:21
  • That on duplicate key update is available for file-imports, too. http://dev.mysql.com/doc/refman/5.1/en/load-data.html <- maybe this is the missing link? – hakre Nov 08 '12 at 00:40
  • @Baba, if the only way to tell that it is a duplicate is by comparing all of the columns, you could just add a constraint including all the columns except the ID. [see this question](http://stackoverflow.com/questions/635937) – Mike Nov 08 '12 at 00:56
  • @Mike am only saying if the `CSV` is not from 100% reliable source then .. he can not import directly to database .. they would still be need to loop and validate. Also if the CSV is to be normalized into multiple tables that might also take another approach – Baba Nov 08 '12 at 00:58
  • @Baba validating the data shouldn't be a problem. You could do all the validation first and then execute the queries second if it makes a difference. – Mike Nov 08 '12 at 01:14

1 Answers1

0

You have few options for importing the csv data into MySQL:

  • mysqlimport - here you can use the --ignore or --replace options. Following Reference Manual:

The --replace and --ignore options control handling of input rows that duplicate existing rows on unique key values. If you specify --replace, new rows replace existing rows that have the same unique key value. If you specify --ignore, input rows that duplicate an existing row on a unique key value are skipped. If you do not specify either option, an error occurs when a duplicate key value is found, and the rest of the text file is ignored.

  • Similarly, in LOAD DATA INFILE you can also specify options IGNORE or REPLACE: If you specify REPLACE, input rows replace existing rows and if you specify IGNORE, input rows that duplicate an existing row on a unique key value are skipped.

There are also some differences on how the MySQL database handles the duplicate keys if you're using LOAD DATA INFILE. But this applies only to a situation when you do not use the IGNORE or REPLACE option.

hakre
  • 193,403
  • 52
  • 435
  • 836
GregD
  • 2,797
  • 3
  • 28
  • 39