1

I have this two querys that I think should behave the same way but they do not.

Query 1 (Always return empty):

SELECT * FROM code
WHERE id = (
                SELECT id FROM code
                WHERE available = 1
                ORDER BY rand()
                LIMIT 1
    )
FOR UPDATE SKIP LOCKED;

Query 2 (Return one random row where available = 1):

SELECT * FROM code
WHERE id = (
        SELECT rc.id FROM
            (
                SELECT id FROM code
                WHERE available = 1
                ORDER BY rand()
                LIMIT 1
            ) as rc
    )
FOR UPDATE SKIP LOCKED;

If I run the subquerys independently they behave the same (Return one random row where available = 1)

Example:

SELECT rc.id FROM
(
  SELECT id FROM code
  WHERE available = 1
  ORDER BY rand()
  LIMIT 1
) as rc;

-- returns 1 row
--
--       id
-- 1 <random_id>
  SELECT id FROM code
  WHERE available = 1
  ORDER BY rand()
  LIMIT 1

-- returns 1 row
--
--       id
-- 1 <random_id>

Table code definiton:

CREATE TABLE `code` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `available` tinyint(4) NOT NULL DEFAULT 1,
  PRIMARY KEY (`id`)
);

I tried both querys but get empty result for the first query and the second gives the expected result. As far as i know they should return the same result, what I am missing?

Shadow
  • 33,525
  • 10
  • 51
  • 64
lukss12
  • 106
  • 4
  • Looks odd. By `independent` do you mean different connections? Which MariaDB version? What does `show engine innodb status\G` show? How does it behave different without `SKIP LOCKED`? – danblack Jun 19 '23 at 23:25
  • 2
    Does this answer your question? [mariadb create function returning random id](https://stackoverflow.com/questions/76431483/mariadb-create-function-returning-random-id) – Georg Richter Jun 20 '23 at 03:04
  • Thank you @GeorgRichter you are correct. Apparently the UNCACHEABLE SUBQUERY is the problem. When using another level of subquery it generates a temporary table and the UNCACHEABLE SUBQUERY is executed over this temporary table that holds the random value. That's why the "Query 2" works. I don't fully understand the subquery caching mechanism. Will look for some docs about it. – lukss12 Jun 21 '23 at 16:51

1 Answers1

0

As @GeorgRichter mentioned in comments section the apparently extrange behaviour of Query 1 is due to subquery caching mechanism. You can read about it in the post shared by Georg.

As you can see in the following EXPLAIN statements when using another level of subquery it generates a temporary table and the UNCACHEABLE SUBQUERY is executed over a temporary table that holds the random value. That's why the "Query 2" works as expected and "Query 1" does not.

EXPLAIN for "Query 1":

EXPLAIN for "Query 1":

EXPLAIN for "Query 2":

EXPLAIN for "Query 2"

I don't know if "Query 1" can be fixed, to behave as "Query 2" without adding another level of subquery, but the main purpose of this question was to understand them and why they behave different. Thanks to @GeorgRichter for pointing me in the correct direction.

EDIT: Another great answer on this matter

JUST A LITTLE REFLEXION: As the previous shared posts indicate the RAND() subquery in "Query 1" is executed once for each row in code table, so exists a probability that the query returns one or multiple rows when this random picked id equals the current row being compared. The probability of each row with available=1 to be picked by the query is 1/TOTAL_AVAILABLE_ROWS and the probability of this query to return n rows is (1/TOTAL_AVAILABLE_ROWS)^n

I think this approach using a common table expression is more performant

WITH random_code AS (
    SELECT id FROM code
    WHERE available = 1
    ORDER BY rand()
    LIMIT 1
)
SELECT * FROM code
JOIN random_code
WHERE code.id = random_code.id
FOR UPDATE SKIP LOCKED;

EXPLAIN:

Explain using common table expression

lukss12
  • 106
  • 4