1

I have a MySQL table which I want to populate with some dummy data for testing (50+).

This table has a foreign key to another table so the dummy data must cross reference from that table but again be random i.e. can't all be the same foreign key.

It also has a date added field which I want to populate with a random date within a year span e.g. any date in the year 2010.

my table structure is:

id, customer_id, date_added, title, total_cost

where id is the primary key, customer_id is the foreign key and date_added is the date field.

What is the best way of doing this? I'd prefer to do it directly in MySQL but if not my site is running on Python so a way of doing this in that would do.

John
  • 21,047
  • 43
  • 114
  • 155

5 Answers5

2

I would not do this in MySQL without outside help from an application written in Python.

There are several requirements built into your statement that are best expressed in a procedural style. SQL is a set-based language; I don't think it lends itself as nicely to the task at hand.

You'll want an application to take in data from a source, do whatever randomization and PII removal that you need, and then construct the test data according to your requirements.

If it's database intended just for test, you might consider an in-memory database that you can populate, modify all you like, and then blow away for your next test. I'm thinking about something like Hypersonic or Derby or TimesTen.

duffymo
  • 305,152
  • 44
  • 369
  • 561
2

quick and dirty solution:

drop table if exists orders;
drop table if exists customers;

create table customers
(
cust_id int unsigned not null auto_increment primary key,
name varchar(255) not null
)
engine=innodb;

create table orders
(
order_id int unsigned not null auto_increment primary key,
cust_id int unsigned not null,
order_date datetime not null,
foreign key (cust_id) references customers(cust_id) on delete cascade
)
engine=innodb;


drop procedure if exists load_test_data;

delimiter #

create procedure load_test_data()
begin

declare v_max_customers int unsigned default 0;
declare v_max_orders int unsigned default 0 ;
declare v_counter int unsigned default 0 ;
declare v_rnd_cust_id int unsigned default 0;
declare v_base_date datetime;

  set foreign_key_checks = 0;

  truncate table orders;
  truncate table customers;

  set foreign_key_checks = 1;

  set v_base_date = "2010-01-01 00:00:00";

  set v_max_customers = 1000;
  set v_max_orders = 10000; 

  start transaction;

  set v_counter = 0;
  while v_counter < v_max_customers do
        insert into customers (name) values (concat('Customer ', v_counter+1));
    set v_counter=v_counter+1;
  end while;

  commit;

  start transaction;

  set v_counter = 0;
  while v_counter < v_max_orders do

    set v_rnd_cust_id = floor(1 + (rand() * v_max_customers));

        insert into orders (cust_id, order_date) values (v_rnd_cust_id, v_base_date + interval v_counter hour);
    set v_counter=v_counter+1;
  end while;

  commit;

end #

delimiter ;

call load_test_data();

select * from customers order by cust_id desc limit 10;
select * from orders order by order_id desc limit 10;
Jon Black
  • 16,223
  • 5
  • 43
  • 42
1

For testing business rules, I actually prefer carefully thought out data over random data. Either from excel->csv->db or manually created insert statements.

One row for each boundary condition, say:

  • Customer without orders
  • One Customer with zero total cost
  • One customer with foreign characters in the name (because I always forget to deal with it)
  • One customer with max length name
  • One Customer with shit loads of orders (to make sure that the GUI still looks nice)

It makes it really easy to run regression tests because you "know" what the data should look like.

For performance testing, you can do pretty good with random data as long as the data distribution is realistic (which affects the usefulness of indexes). If you have very advanced requirements, your best bet is to use some software built for this purpose.

But often you can generate all the data you need from one single table of integers and clever use of built-in functions:

  • rand() -> Generate random number.
  • mod() -> Used to create repeating sequences (1,2,3,1,2,3)
  • lpad() and rpad() -> For padding strings to specified lengths
Ronnis
  • 12,593
  • 2
  • 32
  • 52
0

As such this question is old and answered but I assume you still need to know this one stored procedure to load dummy data to MySQL which runs from MySQL and auto-populates dummy data according to datatypes.

All you need to specify database-name, table-name and number of records to be populate.

call populate('sakila','film',1000,'N');

(You might want to follow on the Git-Repo for updates as well.)

mysql_user
  • 382
  • 3
  • 11
0

If you really want to get down with some setting up of testing data, you should go the fixture route. This will help set yourself up a pretty nice development environment and may integrate very nicely into your website's framework if you're using one.

You can find a link to the documentation of the fixture module here

If you think that's a little too much work to get all working, look into the MySQLdb module which will help you insert data into your table.

It may be in poor taste to link back to a stackoverflow, but someone has already answered the date question you are asking. You can find that here.

Community
  • 1
  • 1