0

Edit: If it makes any difference, I am using mysql 5.7.19.

I have a table A, and am trying to randomly sample on average 10% of the rows. I have decided that using rand() in a subquery, and then filtering out on that random result would do the trick, but it is giving unexpected results. When I print out the randomly generated value after filtering, I get random values that do not match my main query's "where" clause, so I suppose it is regenerating the random value in the outer select.

I guess I'm missing something to do with subqueries and when things are executed, but I'm really not sure what's going on.

Can anyone explain what I might be doing wrong? I've checked out this post: In which sequence are queries and sub-queries executed by the SQL engine? , and my subquery is correlated so I assume that my subquery is being executed first, and then the main query is filtering off of it. Given my assumptions, I do not understand why the result has values that should have been filtered away.

Query:

select 
  * 
from 
(
  select  
    *, 
    rand() as rand_value
  from    
    A
) a_rand 
where 
  rand_value < 0.1;

Result:

--------------------------------------
| id   | events | rand_value         |
--------------------------------------
| c    |      1 | 0.5512495763145849 | <- not what I expected
--------------------------------------
pigate
  • 351
  • 1
  • 7
  • 16
  • Strange and reproducible, but it does seem like it is working. I am only getting around 10% of my table's results each query, but not sure why it is displaying a value greater than 0.1 in the rand_value column. Not related to subqueries since `SELECT *, rand() as rand_value FROM table HAVING rand_value < 0.1` has the same results. – Devon Bessemer Sep 28 '17 at 01:42
  • 1
    your query does NOT contain a correlated subquery, just a simple `derived table` – Paul Maxwell Sep 28 '17 at 01:46
  • ps: see https://en.wikipedia.org/wiki/Correlated_subquery – Paul Maxwell Sep 28 '17 at 01:57
  • Thanks for the clarification abut correlated subqueries. I'm wondering if it's just a mysql 5.7.19 problem since @Used_By_Already's sql fiddle example seems to work. – pigate Sep 28 '17 at 15:01

1 Answers1

0

I am not able to reproduce using this SQL Fiddle use that link and click the blue [Run SQL] button a few times

CREATE TABLE Table1
    (`x` int)
;

INSERT INTO Table1
    (`x`)
VALUES
    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
;

Query 1:

select 
  * 
from (
      select  
             *
           , rand() as rand_value
      from Table1
      ) a_rand 
where 
  rand_value < 0.1

[Results]:

| x |          rand_value |
|---|---------------------|
| 1 | 0.03006686086772649 |
| 1 | 0.09353976332912199 |
| 1 | 0.08519635823107917 |
Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51