3

So for development purposes I need to have a table with around 1 million to 100 million values, my current method isn't fast at all.

What do developers usually do to have their tables with million rows to test how fast their program can handle them?

My current method is having 10 for loops, but its really slow for the amount of rows I need to have.

So what can I do to insert millions of rows rapidly?, What do pro developers do in this cases?

Debels
  • 167
  • 1
  • 3
  • 15
  • http://stackoverflow.com/q/1626059/40822 – dotjoe Jul 19 '14 at 03:49
  • @dotjoe I don't have a file to load into it :/ – Debels Jul 19 '14 at 03:51
  • 2
    You could generate the data to a file and then use that to load into a database many times. – dotjoe Jul 19 '14 at 03:55
  • *What do pro developers do in this cases?* Well, they're not using MySQL for one. [They're too busy bulk copying their data into their database](http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy(v=vs.110).aspx). Well, I suppose that's a lie because bulk copying is so quick there's not much to be busy with. [I'm sorry about your choice of RDBMS.](http://grimoire.ca/mysql/choose-something-else) – ta.speot.is Jul 19 '14 at 04:03
  • @ta.speot.is ok, but which one would you suggest then? – Debels Jul 19 '14 at 04:13
  • @Debels Do you need to load the data from some source (e.g. file) meaning you need some actual meaningful values or your test data can be generated on the fly? – peterm Jul 19 '14 at 05:34
  • 4
    Possible duplicate of [How to insert 20 million record into MySQL database as fast as possible](https://stackoverflow.com/q/8474926/608639). – jww Mar 22 '19 at 17:26

5 Answers5

9

It's not clear from your question what is the nature of the data that you need to insert but if it can be generated on the fly than the fastest possible way is to do in one query (which will insert 1m of customers) like this

INSERT INTO customers (id, customer_name)
SELECT n, CONCAT('Customer', n)
  FROM
(
select a.N + b.N * 10 + c.N * 100 + d.N * 1000 + e.N * 10000 + f.N * 100000 + 1 N
from (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) b
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) c
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) d
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) e
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) f
) t

Here is SQLFiddle demo for 10k rows

Recommended reading:

Community
  • 1
  • 1
peterm
  • 91,357
  • 15
  • 148
  • 157
  • This is what worked for me, just had to replace the name of the table and the fields I actually have on my db, I added IGNORE after the select so it would skip ids already inserted. life saver! – ccrez Jul 05 '23 at 18:09
4

Usually, the slowest part of an insert is updating indexes. To speed up mass inserts, disable indexes, send the inserts, then re-enabled them.

Also, use one of the multiple-insert syntaxes rather than issuing an INSERT statement for each individual row.

3

I guess you insert your records like below:

INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");

Instead use this:

INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2"),
                                                 ("data1", "data2"),
                                                 ("data1", "data2"),
                                                 ("data1", "data2"),
                                                 ("data1", "data2");

Note: To optimize insert speed, combine many small operations into a single large operation. Ideally, you make a single connection, send the data for many new rows at once, and delay all index updates and consistency checking until the very end.

Bla...
  • 7,228
  • 7
  • 27
  • 46
3

The fastest solution is not to load data at all, but start with a pre-populated data directory.

If you are just initializing data for development/testing, then don't load data at all. Start up a new instance of MySQL from a physical backup.

You can use Percona XtraBackup to take a physical backup of any running MySQL instance without blocking traffic.

Then prepare the backup, and it's instantly usable as a live data directory.

Then you can get a new test instance set up as quickly as you can run cp to copy that data directory, and launch an instance of mysqld, specifying --datadir to your copy of the backup.

You can repeat this, copying the physical backup to as many dev/test instances as you need.


Second option: don't use INSERT, use LOAD DATA INFILE. This can be an order of magnitude faster than using INSERT, even with prepared statements and multi-row syntax.

Third option: prepare your bulk data as a CSV file, move it into your MySQL's data directory, then create a table with ENGINE=CSV pointing to that file. Voila, you have a table full of data. Then use ALTER TABLE ENGINE=InnoDB and convert it to a real internal table.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
-1
  1. Write script to generate data
  2. Use output to insert into database
  3. Just run overnight
  4. Enjoy that pint when it is doing its stuff
Ed Heal
  • 59,252
  • 17
  • 87
  • 127