1

I'm trying to select first names from a lookup table at random in MySQL to build a test dataset. I have a table with 200 first names, genders and a row id going from 1 to 200. Something like this:

id   firstname   gender
1    Aaron       m
2    Adam        m
3    Alan        m
etc...

I'm selecting from this table using a random generator with the following query:

SELECT id, firstname FROM firstname WHERE id = round(1 + (rand() * 199));

I am expecting the random number to tally up with exactly one id from the lookup table, thus producing a single results like

id   firstname
43   Jason

Running the code again and again instead gives me a selection of

  • single rows (as above)
  • or multiple rows like

id firstname 29 Ethan 147 Jean

  • or no results (just NULL in both fields).

If I run the random generator on its own, it will always generate a number between 1 and 200. As you can see below, the id field is INT, and the query behaves the same way if I cast the result as SIGNED. I have also tried to use FLOOR instead of ROUND, just to see if that worked any differently - alas, no.

Can anyone tell my why the anomaly? What am I missing?

Here is some code to create the first 20 rows of the original table for testing purposes:

-- First Name --
drop table if exists firstname;

  CREATE TABLE firstname (
  id INT NOT NULL,
  firstname VARCHAR(20) NOT NULL,
  gender VARCHAR(1) NOT NULL,
  PRIMARY KEY (id),
  UNIQUE (firstname)
  );

  INSERT INTO firstname
  (id,firstname,gender)
  VALUES
  (1,"Aaron","m"),
(2,"Adam","m"),
(3,"Alan","m"),
(4,"Albert","m"),
(5,"Alexander","m"),
(6,"Andrew","m"),
(7,"Anthony","m"),
(8,"Arthur","m"),
(9,"Austin","m"),
(10,"Benjamin","m"),
(11,"Billy","m"),
(12,"Bobby","m"),
(13,"Brandon","m"),
(14,"Brian","m"),
(15,"Bruce","m"),
(16,"Bryan","m"),
(17,"Carl","m"),
(18,"Charles","m"),
(19,"Christian","m"),
(20,"Christopher","m");
zoltansn
  • 67
  • 1
  • 12
  • you can use `SELECT id, firstname FROM firstname order by rand() limit 1` instead of joins and generators. – Dmytro Kh. Jun 20 '18 at 08:47
  • nice and compact, but having ran a number of tests the results are relatively homogenic in that the variation in the number of each result (I'm filling a large table with random generated records) is rather quite small. Across 200 names and 10K records the number of occurrences will be somewhere between 30 odd and 70 with majority being around 50. I would need a completely random result with a number of names only appearing maybe once or even not at all while one or two being as frequent as 100 or more. I've also noticed a pattern of blocks of 8 names with decreasing frequency and back again. – zoltansn Jul 18 '18 at 13:26

1 Answers1

0

Since RAND() is not deterministic, the WHERE condition is evaluated/executed once per each row. Thus each row has a chance of 1/199 to be selected. You can use a subquery in the FROM clause (derived table) instead to generate exactly one random number:

SELECT f.id, f.firstname
FROM firstname f
JOIN (SELECT floor(rand()*200)+1 as rnd) r ON r.rnd = f.id
Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53
  • Thank you Paul, your code works like a charm, I just want to make sure I understand exactly what you're saying though? So when the query runs the WHERE clause runs on every one of the rows in the lookup table? Meaning, when it looks at id 1 and the random generator happens to be 1 it will add it to the results, and so on until 200? – zoltansn Jun 19 '18 at 16:42
  • Yes - For every row a new random number is generated in the WHERE clause. This way you can get no rows, multiple rows or even (though very unlikely) all rows from 1 to 200. – Paul Spiegel Jun 19 '18 at 16:46