1

I have a 9 million rows table and I'm struggling to handle all this data because of its sheer size.

What I want to do is add IMPORT a CSV to the table without overwriting data.

Before I would of done something like this; INSERT if not in(select email from tblName where source = "number" and email != "email") INTO (email...) VALUES ("email"...)

But I'm worried that I'll crash the server again. I want to be able to insert 10,000s of rows into a table but only if its not in the table with source = "number".

Otherwise I would of used unique on the email column.

In short, I want to INSERT as quickly as possible without introducing duplicates to the table by checking two things. If email != "email" AND source != "number" then insert into table otherwise do nothing. And I dont want errors reports either.

I'm sorry for my bad wording and the question sounding a little silly.

I'm just having a hard time adabting to not been able to test it out on the data by downloading backups and uploading if it goes wrong. I hate large datasets :)

Thank-you all for your time -BigThings

Community
  • 1
  • 1
TheNextBigThing
  • 345
  • 1
  • 3
  • 10

2 Answers2

3

If you have unique keys on these fields you can use LOAD DATA INFILE with IGNORE option. It's faster then inserting row by row, and is faster then multi-insert as well.

Look at http://dev.mysql.com/doc/refman/5.1/en/load-data.html

Maxim Krizhanovsky
  • 26,265
  • 5
  • 59
  • 89
Andrej
  • 7,474
  • 1
  • 19
  • 21
  • +1 for `LOAD DATA INFILE`. If that's available to the OP, it would be the best choice. – Shef Aug 28 '11 at 17:18
  • I don't have unique keys on because the email can pop up more than once with a different source. I can't use unique but thank-you for the idea. – TheNextBigThing Aug 28 '11 at 17:52
  • 1
    What about composite UNIQUE KEY (email, source)? – Andrej Aug 28 '11 at 17:56
  • @Andrej L. So you're saying this; "INSERT INTO table_name(email, source, ...) VALUES ('email', 'source', ...) ON DUPLICATE KEY(email,source) UPDATE email = email;" Thank you for the tip ;) – TheNextBigThing Aug 28 '11 at 18:51
0

Set a UNIQUE constraint on email and source columns.

Then do:

INSERT INTO table_name(email, source, ...) VALUES ('email', 'source', ...)
ON DUPLICATE KEY UPDATE email = email;

INSERT IGNORE will not notify you of any kind of error. I would not recommend it. Neither would I recommend INSERT ... WHERE NOT IN. MySQL has an already well optimized functionality for that. That's why INSERT ... ON DUPLICATE KEY UPDATE is there.

Shef
  • 44,808
  • 15
  • 79
  • 90
  • Won't the `update` statement just update the email to it's current value? Seems like `INSERT IGNORE` would be a better option. – Chris Henry Aug 28 '11 at 17:10
  • @Chris Henry: Yes, it will update to the current value of `email` column. Read the accepted answer of the question ["INSERT IGNORE" vs "INSERT … ON DUPLICATE KEY UPDATE"](http://stackoverflow.com/questions/548541/insert-ignore-vs-insert-on-duplicate-key-update) for a detailed explanation why should `INSERT IGNORE` not be used. – Shef Aug 28 '11 at 17:12
  • I don't have unique keys set and I can't set it either because the email be used more than once with a difference value within source. I have to check that the email doesn't exist with same value in source. (SELECT id FROM theTable WHERE email="$email" AND source = "NUMBER") - If it returns 0 rows then it can be inserted. – TheNextBigThing Aug 28 '11 at 17:53
  • @TheNextBigThing: If you are checking them with an `AND` then you can have a `UNIQUE` constraint. Just like I suggested `email` AND `source` should be `UNIQUE`, that is a composite constraint. – Shef Aug 28 '11 at 17:58
  • @Shef, You sir have blown my mind. I'm getting a bit confused. This is what is in my mind. INSERT INTO table_name(email, source, ...) VALUES ('email', 'source', ...) ON DUPLICATE email AND source UPDATE 1=1;// I dont want to update anything. I want to skip it. I'm learning so much. I thought I knew MySQL quite well :) – TheNextBigThing Aug 28 '11 at 18:04
  • @TheNextBigThing: [Read the manual](http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html). It's easy to learn by examples. You don't have to replace `KEY` with anything, it must be there. You have to do an update, you can't do `1 = 1`. It sets the `email` column to its current value, in essence it doesn't change anything. If you have an `id` column on that table change `email = email` to `id = id`. – Shef Aug 28 '11 at 18:13
  • @Shef, I actually don't see anything in that answer that suggests `INSERT IGNORE` should not be used. It seems to be desired the use case here. – Chris Henry Sep 05 '11 at 21:27
  • @Chris Henry: Look at case two and three of the cases when `INSERT IGNORE` does not generate an error. If you think that's a desired result for a serious DBA, then you better think again. – Shef Sep 06 '11 at 16:57