How to distribute duplicate rows in MySQL output?

Question

I have a MySQL query which outputs rows in random order.

Each row has a name column and there may be times where the query produces duplicate rows (rows having the same value for name) - This is by design.

However, sometimes the query will accidently group duplicate rows directly next to each other in the output.

I'm trying to find a way to evenly distribute rows with duplicate names across the output, so that duplicate rows cannot get accidently grouped together.

This should be done on your display mechanism rather than your data storage mechanism. for example; using PHP code if you're using PHP to output your MySQL data. Also, `RAND()` is extremely inefficient and it would be better to shuffle the outputs in the display code rather than when collecting the data from the table. — Martin, Apr 26 '23 at 10:24
In my app, I show data in a random order, but with pagination (10 rows at a time). To respect pagination in the random dataset, I use RAND() and provide a seed. If I was to do this in the display code, I'd have to SELECT all the rows in the DB, then shuffle with the same seed and select the desired page. — Craig Malton, Apr 27 '23 at 07:26
Please see [This Q&A](https://stackoverflow.com/questions/10729633/php-mysql-pagination-with-random-ordering) about that. — Martin, Apr 27 '23 at 10:50

score 0 · Answer 1 · answered Apr 26 '23 at 10:21

One approach to achieve this would be to modify your MySQL query to include an additional sorting criteria that ensures that rows with the same name are distributed evenly throughout the output.

Here's an example query that accomplishes this:

SELECT * FROM my_table ORDER BY name, RAND();

This query sorts the rows by name first, and then by a random value generated by the RAND() function. The random value ensures that rows with the same name are distributed randomly throughout the output, rather than being grouped together.

Note that using RAND() function in the ORDER BY clause can be computationally expensive and may not scale well for large tables. If this is the case, you might consider using a different deterministic function, such as the MD5 hash of the name, to achieve a similar result.

SELECT * FROM my_table ORDER BY MD5(name), RAND();

This query sorts the rows by the MD5 hash of the name first, and then by a random value generated by the RAND() function. The MD5 hash ensures that rows with the same name are distributed evenly throughout the output, while the random value ensures that the order of the rows is randomized.

This doesn't promise duplicate rows are non-sequential, this merely makes it slightly less likely. — Martin, Apr 26 '23 at 10:25

How to distribute duplicate rows in MySQL output?

1 Answers1