8

I have a MySQL Table that looks like this:

MySQL Table: status

The SQL to create the structure is:

CREATE TABLE `status` (
`id` INT(11) NOT NULL,
`responseCode` INT(3) NOT NULL DEFAULT '503',
`lastUpdate` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

It stores a unique id, responseCode, and lastUpdate. The responseCode is an HTTP Request response code: 404, 500, 503, 200, etc.

I have a URL to correspond to each id for which I make an HTTP request and record in this table the time I made the request and the response received.

The script makes this query against the status table:

SELECT id FROM status WHERE lastUpdate < 'XXXX' OR 
(responseCode != 200 AND responseCode != 404) 
ORDER BY id DESC LIMIT 100

Where XXXX would be a date where I decide that anything older than that date needs to be refreshed regardless of the response code. Further, I want to reattempt the HTTP request if I didn't get a 200 or 404 regardless of the last lastUpdate date. I LIMIT to 100 because I only run 100 at a time, and then I have it sleep for a while and do another 100 later, and so on.

Anyways, all that's fine, but what I want to do is populate the table ahead of time with say a series like this:

(1, 503, NOW()), (2, 503, NOW()), (3, 503, NOW()) ... (100000, 503, NOW())

Notice, only the ID is incrementing, but it may not necessarily start at 1 for my needs. I want the table pre-populated like this, because then the query above can keep grabbing id's for ones we need to reattempt, and I'd like to not have to ever insert anything more into the status table as the id's are finite and will not change (but there are many of them).

I tried using JAVA, (though PHP, C#, or whatever else is the same concept and doesn't matter to me which language I use here):

PreparedStatement st = conn.prepareStatement("INSERT INTO status VALUES (?,default,default)");

for( int i = 1; i <= 100000; i++ ) {
    st.setInt(1,i);
    st.addBatch();
}

System.out.println( "Running batch..." );
st.executeBatch();
System.out.println( "Batch done!" );

This starts the inserts, but the issue is that it takes an extraordinary amount of time to populate the table (I don't have an exact time, but it was running for hours). So, my question boils down to: is there an easy and efficient way to populate a MySQL table with a mass amount of rows like this?

user17753
  • 3,083
  • 9
  • 35
  • 73
  • Added a pure sql solution to my answer, let me know if you find anything faster. – xception Oct 08 '12 at 21:55
  • Possible duplicate of [How to insert 20 million record into MySQL database as fast as possible](https://stackoverflow.com/q/8474926/608639). – jww Mar 22 '19 at 17:28

4 Answers4

15

Generally speaking, you can use any one or more of the following:

  • Start a transaction, do inserts, commit
  • Pack multiple values into a single insert into query
  • Drop any constraints before doing insert and reinstate constraints after the mass insert (except possibly primary key, not very sure about it though)
  • Use insert into ... select if suitable

The first (using transactions) is most likely to help, but i'm not sure if it works on myisam tables, with innodb it does a very good job - I only use those when I'm forced to use mysql, I prefer postgresql.

In your specific case, inserting 100000 rows of data, you could do the following:

INSERT INTO status(id, responseCode, lastUpdate) SELECT @row := @row + 1 as row, 503, NOW() FROM 
(select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t,
(select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t2, 
(select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t3, 
(select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t4, 
(select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) t5, 
(SELECT @row:=0) t6;

Tested this on my machine, got:

Query OK, 100000 rows affected (0.70 sec)
Records: 100000  Duplicates: 0  Warnings: 0

I'm pretty sure you can't get much faster than that for 100000 rows.

xception
  • 4,241
  • 1
  • 17
  • 27
  • 2
    If you're running many insert statements, grouping them into transactions prevents the db from writing to disk after each one, it ensures they get committed to disk all at once at the end of the transaction. – Mihai Stancu Oct 08 '12 at 20:40
  • 1
    This does seem quite fast, can you explain briefly what is going on in that query? – user17753 Oct 09 '12 at 13:35
  • I create consecutive numbers, by joining 5 tables containing 0 to 9 and then select Number, constant, constant ... which is very fast ... then insert all the 100000 entries in a single transaction since it's a single query. – xception Oct 09 '12 at 16:20
  • Oh, could you please checkmark the answer since it works and I don't think it can be done faster than this (at least not in mysql), however, if anyone comes up with a faster solution later you can just mark that as the correct answer instead. – xception Oct 09 '12 at 16:30
  • @user17753 I wrote an postgresql equivalent for that huge select, hopefully it will be easier to understand what I did http://sqlfiddle.com/#!1/d41d8/501 ... I only wrote it up to 20 instead of 100000 because I didn't want to overload the sqlfiddle servers too much – xception Oct 09 '12 at 16:53
  • Perfect solution to duplicate rows when the data is exactly the same. Before the insert takes 20s each 10000 rows. Now 1s. Really perfect. – workdreamer Nov 24 '14 at 23:38
  • Just awesome ;) – Stranger Oct 03 '16 at 12:51
9

How about setting AUTO_INCREMENT on the primary key.

Then inserting the first hundred (or thousand) rows whichever way you like (your example or the example DocJones gave you).

Then using

INSERT INTO table SELECT NULL, '503', NOW() FROM table;

...repeatedly a few time. This should make the table double in size every time.

The NULL in the first slot of the SELECT ensures the AUTO_INCREMENT kicks in and increments id.

If you wanna grow the table even faser you can do

INSERT INTO table SELECT NULL, '503', NOW() FROM table AS t1 CROSS JOIN table t2;

...repeatedly a few times which would make the table increase in size with powers of two of the previous size + previous size (100^2+100).

This also allows you to customize the values inserted for example if you want to create "random" responseCodes you can use something like CONCAT(ROUND(1+RAND()*4), '0', ROUND(RAND()*5)) which will give you response codes ranging from 100 to 505.

Mihai Stancu
  • 15,848
  • 2
  • 33
  • 51
  • I think this idea is really keen. I will try this out. – user17753 Oct 08 '12 at 20:34
  • 1
    Careful with the `CROSS JOIN` you can enter 10 values manually then run `CROSS JOIN` and get 10+10^2 = 110, then you'll repeat the `CROSS JOIN` and poof you've got 110+110^2 = 12,210, at the third repetition you're already at 149,096,310 - one hundred forty nine **million** entries, that'll chow down on some disk-io - and take some time to write. – Mihai Stancu Oct 08 '12 at 20:37
2

PHP solution to load them in batches of 100:

for ($i = 0; $i < 100000; $i+=100) {
  $vals = implode(', ', 
                  array_map(function($j) { return "($j, default, default)";},
                            range($i, $i+100)));
  mysqli_query($dbh, 'insert into status values ' . $vals) or die mysqli_error($dbh);
}
Barmar
  • 741,623
  • 53
  • 500
  • 612
1

You are creating one LARGE batched statement to be performed. Try splitting it in smaller packages using, eg. call executeBatch() every 1000 increments of i (using mod(i) yaddayadda) inside the loop. That should speed up the process:

for( int i = 1; i <= 100000; i++ ) {
    st.setInt(1,i);
    st.addBatch();
    if (mod(i,1000)=0) {
       st.executeBatch();
    }
}
DocJones
  • 649
  • 7
  • 26
  • I've noticed that executing the batch (as in my question) still populates the table actively (e.g. I can watch the db fill) same as your snippet here. Though, I don't experience any major difference in the performance of the inserts. – user17753 Oct 08 '12 at 20:28