0

So I'm working on a PHP project that needs to allow the user to import CSV files into MySQL. The CSV files contain a "column" with a unique ID... there are duplicates. Here is where the tricky part comes in... The duplicates need to go into a separate table and not into the main table.

I have written code to do this but there has to be a more efficient way... Right now, the script reads the CSV and for each row, queries the db to see if the ID already exists, then puts it in the proper table. These CSV files contain 50k+ rows each so this gets to be extra time consuming...

Any Idea's would be great! Thanks!

mike
  • 8,041
  • 19
  • 53
  • 68
  • 2
    Advice is the same as the last time you asked: http://stackoverflow.com/questions/1965001/mysql-duplicates-with-load-data-infile – OMG Ponies Dec 29 '09 at 18:14

2 Answers2

2

Bulk load the CSV into MySQL, then run a query with GROUP BY id HAVING COUNT(id) > 1 and pull those rows out.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
0

Make sure your MySQL table is set to not accept duplicate values for the ID column, then do the following:

  1. Run your INSERT statement as if the ID doesn't exist
  2. Test the return. It will fail if the ID already exists
  3. If the test failed, run a new INSERT statement to put the record into your second table

This way, if 98% of your records aren't duplicates, you will see no speed decrease, and only the few duplicates will run the extra tests.

Doug Neiner
  • 65,509
  • 13
  • 109
  • 118