INSERT IGNORE or INSERT WHERE NOT IN

Question

I have a 9 million rows table and I'm struggling to handle all this data because of its sheer size.

What I want to do is add IMPORT a CSV to the table without overwriting data.

Before I would of done something like this; INSERT if not in(select email from tblName where source = "number" and email != "email") INTO (email...) VALUES ("email"...)

But I'm worried that I'll crash the server again. I want to be able to insert 10,000s of rows into a table but only if its not in the table with source = "number".

Otherwise I would of used unique on the email column.

In short, I want to INSERT as quickly as possible without introducing duplicates to the table by checking two things. If email != "email" AND source != "number" then insert into table otherwise do nothing. And I dont want errors reports either.

I'm sorry for my bad wording and the question sounding a little silly.

I'm just having a hard time adabting to not been able to test it out on the data by downloading backups and uploading if it goes wrong. I hate large datasets :)

Thank-you all for your time -BigThings

score 3 · Answer 1 · edited Aug 28 '11 at 17:13

3

If you have unique keys on these fields you can use LOAD DATA INFILE with IGNORE option. It's faster then inserting row by row, and is faster then multi-insert as well.

Look at http://dev.mysql.com/doc/refman/5.1/en/load-data.html

edited Aug 28 '11 at 17:13

Maxim Krizhanovsky

26,265
5
59
89

answered Aug 28 '11 at 17:08

Andrej

7,474
1
19
21

+1 for `LOAD DATA INFILE`. If that's available to the OP, it would be the best choice. – Shef Aug 28 '11 at 17:18
I don't have unique keys on because the email can pop up more than once with a different source. I can't use unique but thank-you for the idea. – TheNextBigThing Aug 28 '11 at 17:52
1

What about composite UNIQUE KEY (email, source)? – Andrej Aug 28 '11 at 17:56
@Andrej L. So you're saying this; "INSERT INTO table_name(email, source, ...) VALUES ('email', 'source', ...) ON DUPLICATE KEY(email,source) UPDATE email = email;" Thank you for the tip ;) – TheNextBigThing Aug 28 '11 at 18:51

score 0 · Accepted Answer · answered Aug 28 '11 at 17:08

0

Set a UNIQUE constraint on email and source columns.

Then do:

INSERT INTO table_name(email, source, ...) VALUES ('email', 'source', ...)
ON DUPLICATE KEY UPDATE email = email;

INSERT IGNORE will not notify you of any kind of error. I would not recommend it. Neither would I recommend INSERT ... WHERE NOT IN. MySQL has an already well optimized functionality for that. That's why INSERT ... ON DUPLICATE KEY UPDATE is there.

answered Aug 28 '11 at 17:08

Shef

44,808
15
79
90

Won't the `update` statement just update the email to it's current value? Seems like `INSERT IGNORE` would be a better option. – Chris Henry Aug 28 '11 at 17:10
@Chris Henry: Yes, it will update to the current value of `email` column. Read the accepted answer of the question ["INSERT IGNORE" vs "INSERT … ON DUPLICATE KEY UPDATE"](http://stackoverflow.com/questions/548541/insert-ignore-vs-insert-on-duplicate-key-update) for a detailed explanation why should `INSERT IGNORE` not be used. – Shef Aug 28 '11 at 17:12
I don't have unique keys set and I can't set it either because the email be used more than once with a difference value within source. I have to check that the email doesn't exist with same value in source. (SELECT id FROM theTable WHERE email="$email" AND source = "NUMBER") - If it returns 0 rows then it can be inserted. – TheNextBigThing Aug 28 '11 at 17:53
@TheNextBigThing: If you are checking them with an `AND` then you can have a `UNIQUE` constraint. Just like I suggested `email` AND `source` should be `UNIQUE`, that is a composite constraint. – Shef Aug 28 '11 at 17:58
@Shef, You sir have blown my mind. I'm getting a bit confused. This is what is in my mind. INSERT INTO table_name(email, source, ...) VALUES ('email', 'source', ...) ON DUPLICATE email AND source UPDATE 1=1;// I dont want to update anything. I want to skip it. I'm learning so much. I thought I knew MySQL quite well :) – TheNextBigThing Aug 28 '11 at 18:04
@TheNextBigThing: [Read the manual](http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html). It's easy to learn by examples. You don't have to replace `KEY` with anything, it must be there. You have to do an update, you can't do `1 = 1`. It sets the `email` column to its current value, in essence it doesn't change anything. If you have an `id` column on that table change `email = email` to `id = id`. – Shef Aug 28 '11 at 18:13
@Shef, I actually don't see anything in that answer that suggests `INSERT IGNORE` should not be used. It seems to be desired the use case here. – Chris Henry Sep 05 '11 at 21:27
@Chris Henry: Look at case two and three of the cases when `INSERT IGNORE` does not generate an error. If you think that's a desired result for a serious DBA, then you better think again. – Shef Sep 06 '11 at 16:57

INSERT IGNORE or INSERT WHERE NOT IN

2 Answers2

Linked