69

I need to load a table with a large amount of test data. This is to be used for testing performance and scaling.

How can I easily create 100,000 rows of random/junk data for my database table?

p.campbell
  • 98,673
  • 67
  • 256
  • 322
Mohammad Umair
  • 707
  • 1
  • 7
  • 9

8 Answers8

89

You could also use a stored procedure. Consider the following table as an example:

CREATE TABLE your_table (id int NOT NULL PRIMARY KEY AUTO_INCREMENT, val int);

Then you could add a stored procedure like this:

DELIMITER $$
CREATE PROCEDURE prepare_data()
BEGIN
  DECLARE i INT DEFAULT 100;

  WHILE i < 100000 DO
    INSERT INTO your_table (val) VALUES (i);
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;

When you call it, you'll have 100k records:

CALL prepare_data();
Daniel Vassallo
  • 337,827
  • 72
  • 505
  • 443
  • Are you sure it would work for MySQL? I never seen `WHILE` in MySQL queries. – stack Sep 11 '16 at 01:58
  • its good but we need some mean full data with some more columns in data. – Indrajeet Gour Dec 26 '16 at 06:35
  • 1
    @fritzmg That's because the initial value for i was set to 100. – kojow7 Sep 14 '18 at 02:12
  • I've tried this with 2 million rows, and it takes a very long time, since it's committing one row at a time. Would be nice to have a way to do lots of them in parallel. I'm thinking generating a query string programmatically might improve things, depending on where the bottlenecks are. Then again, it might just bog down on generating the query string. – Mark Verrey Sep 28 '18 at 21:44
  • This creates infinite records if you forget to increment `i`... which is what I did. – Captain Hypertext Dec 28 '18 at 23:16
18

For multiple row cloning (data duplication) you could use

DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < 100000 DO
    INSERT INTO `table` (`user_id`, `page_id`, `name`, `description`, `created`)
    SELECT `user_id`, `page_id`, `name`, `description`, `created`
    FROM `table`
    WHERE id = 1;
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;
CALL insert_test_data();
DROP PROCEDURE insert_test_data;
michalzuber
  • 5,079
  • 2
  • 28
  • 29
4

Here it's solution with pure math and sql:

create table t1(x int primary key auto_increment);
insert into t1 () values (),(),();

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 1265 rows affected (0.01 sec)
Records: 1265  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 2530 rows affected (0.02 sec)
Records: 2530  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 5060 rows affected (0.03 sec)
Records: 5060  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 10120 rows affected (0.05 sec)
Records: 10120  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 20240 rows affected (0.12 sec)
Records: 20240  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 40480 rows affected (0.17 sec)
Records: 40480  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 80960 rows affected (0.31 sec)
Records: 80960  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 161920 rows affected (0.57 sec)
Records: 161920  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 323840 rows affected (1.13 sec)
Records: 323840  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 647680 rows affected (2.33 sec)
Records: 647680  Duplicates: 0  Warnings: 0
Daniil Iaitskov
  • 5,525
  • 8
  • 39
  • 49
  • Does your table only have a single column? I like the concept of just duplicating existing rows, but your query may need tweaking somewhat to use additional columns. You should also be able to insert `NULL` to cover the `AUTO_INCREMENT` columns. – Simon East Oct 16 '17 at 21:33
  • I don't like duplicating rows - when querying for tests - the use case might be such that you might not see where you query fails in the expected result .. – Ken Apr 18 '20 at 20:14
3

If you want more control over the data, try something like this (in PHP):

<?php
$conn = mysql_connect(...);
$num = 100000;

$sql = 'INSERT INTO `table` (`col1`, `col2`, ...) VALUES ';
for ($i = 0; $i < $num; $i++) {
  mysql_query($sql . generate_test_values($i));
}
?>

where function generate_test_values would return a string formatted like "('val1', 'val2', ...)". If this takes a long time, you can batch them so you're not making so many db calls, e.g.:

for ($i = 0; $i < $num; $i += 10) {
  $values = array();
  for ($j = 0; $j < 10; $j++) {
    $values[] = generate_test_data($i + $j);
  }
  mysql_query($sql . join(", ", $values));
}

would only run 10000 queries, each adding 10 rows.

gmarcotte
  • 385
  • 1
  • 4
0

try filldb

you can either post your schema or use existing schema and generate dummy data and export from this site and import in your data base.

ganesh konathala
  • 317
  • 1
  • 4
  • 14
0
create table mydata as select * from information_schema.columns;
insert into mydata select * from mydata;
-- repeating the insert 11 times will give you at least 6 mln rows in the table.

I am terribly sorry if this is out of place, but I wanted to offer some explanation on this code as I know just enough to explain it and how the answer above is rather useful if you only understand what it does.

The first line Creates a table called mydata , and it generates the layout of the columns from the information_schema, which stores the information about your MYSQL server, and in this case, it is pulling from information_schema.columns, which allows the table being created to have all the column information needed to create not only the table, but all the columns you will need automatically, very handy.

The second line starts off with an Insert statement that will now target that new table called mydata and insert the Information_schema data into the table. The last line is just a comment suggesting you run the script a few times if you want to generate more data.

Lastly in conclusion, in my testing, one execution of this script generated 6,956 rows of data. If you are needing a quick way to generate some records, this isn't a bad method. However, for more advanced testing, you might want to ALTER the table to include a primary key that auto increments so that you have a unique index as a database without a primary key is a sad database. It also tends to have unpredictable results since there can be duplicate entries. All that being said, I wanted to offer some insight into this code because I found it useful, I think others might as well, if only they had spent the time to explain what it is doing. Most people aren't a fan of executing code that they have no idea what it is going to do, even from a trusted source, so hopefully someone else found this useful as I did. I'm not offering this as "the answer" but rather as another source of information to help provide some logistical support to the above answer.

easleyfixed
  • 219
  • 1
  • 13
Max
  • 19
  • 1
  • 3
    Please don't post only code as an answer, but also include an explanation of what your code does and how it solves the problem of the question. Answers with an explanation are usually of higher quality and are more likely to attract upvotes – Krzysztof Janiszewski Mar 25 '20 at 16:33
0

I really like the mysql_random_data_loader utility from Percona, you can find more details about it here.

mysql_random_data_loader is a utility that connects to the mysql database and fills the specified table with random data. If foreign keys are present in the table, they will also be correctly filled.

This utility has a cool feature, the speed of data generation can be limited.

For example, to generate 30,000 records, in the sakila.film_actor table with a speed of 500 records per second, you need the following command

mysql_random_data_load sakila film_actor 30000 --host=127.0.0.1 --port=3306 --user=my_user --password=my_password --qps=500 --bulk-size=1

I have successfully used this tool to simulate a workload in a test environment by running this utility on multiple threads at different speeds for different tables.

Ivan
  • 241
  • 1
  • 10
0

This is a more performant modification to @michalzuber answer. The only difference is removing the WHERE id = 1, so that the inserts can accumulate on each run.

The amount of records produced would be n^2;

So for 10 iterations 10^2 = 1024 records For 20 iterations 20^2 = 1048576 records and so on.

DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i <= 10 DO
    INSERT INTO `table` (`user_id`, `page_id`, `name`, `description`, `created`)
    SELECT `user_id`, `page_id`, `name`, `description`, `created`
    FROM `table`;
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;
CALL insert_test_data();
DROP PROCEDURE insert_test_data;
Bernard Wiesner
  • 961
  • 6
  • 14