1

My users will upload large csv files. I need to add bulk mysql values (say insert into my_table (name, email) values ('john', 'abc@d.co'), ('foo','def@g.org') ...)

If duplicate emails exist on the database or in the query itself, they shouldn't be inserted (or they should be updated with the new name).

I was running a single query check per value but performance was awful (a 2,000 csv file was taking up to a minute, or more).

How can I optimize this, preferably a great query, or creating a unique composite key (user_id and email combination is unique) and stating insert ignore into my_table ... to ignore duplicates. But how?

Thanks!

lu1s
  • 5,600
  • 3
  • 22
  • 37
  • Check this out: http://stackoverflow.com/questions/548541/insert-ignore-vs-insert-on-duplicate-key-update – dmgig Jun 05 '14 at 19:41
  • This is good too - very applicable to your case as it involves updating multiple rows: http://stackoverflow.com/questions/8991593/on-duplicate-key-update-multiple-columns – dmgig Jun 05 '14 at 19:47

1 Answers1

2

I lifted this directly from here: On Duplicate Key Update - Multiple Columns

If it works you should probably give him/her a bump.

INSERT INTO my_table
  (name, email)
VALUES
  ('john', 'abc@d.co'), 
  ('foo','def@g.org')
ON DUPLICATE KEY UPDATE
  name=VALUES(name)
  email=VALUES(email)
Community
  • 1
  • 1
dmgig
  • 4,400
  • 5
  • 36
  • 47
  • 1
    Thanks a lot @dgig. Having this, I require to make a composite key right? `alter table my_table add unique (id_user, email);` - Thanks again – lu1s Jun 05 '14 at 19:54
  • In your case I think you just want to put a unique index on 'email' so that it updates when the email is the same, right? (your ids will never be the same anyway, and if I'm right - you want to not have duplicate emails). It will use any matching key to move onto the "on duplicate key value" part of the query. So even if you have a primary key on Id and another on email, it will trigger when email is the same, even if ID is not. long story short - not a composite key, just a unique index on "email" – dmgig Jun 05 '14 at 19:59
  • Yeah, there can be duplicate emails, but with different user ids, the table is composed by `id_user, name, email`. My users can upload their contacts, and it may be that two or more users have the same contact in their contact lists, so thats why it will work with composite key. I already tried, and works :) Thanks a lot. – lu1s Jun 05 '14 at 22:16