Why does SELECT results differ between mysql and sqlite?

Question

I'm re-asking this question in a simplified and expanded manner.

Consider these sql statements:

create table foo (id INT, score INT);

insert into foo values (106, 4);
insert into foo values (107, 3);
insert into foo values (106, 5);
insert into foo values (107, 5);

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
    select T2.id, avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

Using sqlite, the select statement returns:

id          avg1      
----------  ----------
106         4.5       
107         4.0

and mysql returns:

+------+--------+
| id   | avg1   |
+------+--------+
|  106 | 4.5000 |
+------+--------+

As far as I can tell, mysql's results are correct, and sqlite's are incorrect. I tried to cast to real with sqlite as in the following but it returns two records still:

select T1.id, cast(avg(cast(T1.score as real)) as real) avg1
from foo T1
group by T1.id
having not exists (
    select T2.id, cast(avg(cast(T2.score as real)) as real) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

Why does sqlite return two records?

Quick update:

I ran the statement against the latest sqlite version (3.7.11) and still get two records.

Another update:

I sent an email to sqlite-users@sqlite.org about the issue.

Myself, I've been playing with VDBE and found something interesting. I split the execution trace of each loop of not exists (one for each avg group).

To have three avg groups, I used the following statements:

create table foo (id VARCHAR(1), score INT);

insert into foo values ('c', 1.5);
insert into foo values ('b', 5.0);
insert into foo values ('a', 4.0);
insert into foo values ('a', 5.0);

PRAGMA vdbe_listing = 1;
PRAGMA vdbe_trace=ON;

select avg(score) avg1
from foo
group by id
having not exists (
    select avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

We clearly see that somehow what should be r:4.5 has become i:5:

enter image description here

I'm now trying to see why that is.

Final edit:

So I've been playing enough with the sqlite source code. I understand the beast much better now, although I'll let the original developer sort it out as he seems to already be doing it:

http://www.sqlite.org/src/info/430bb59d79

Interestingly, to me at least, it seems that the newer versions (some times after the version I'm using) supports inserting multiple records as used in a test case added in the aforementioned commit:

CREATE TABLE t34(x,y);
INSERT INTO t34 VALUES(106,4), (107,3), (106,5), (107,5);

Just for kicks I ran this to SQL what SQL Server would produce and it complained `avg2` and `avg1` didn't exist. I replaced them with `MAX(T2.score)` and `MAX(T1.score)` and it gave the SQLite result. When I created the table with `score REAL` it gave the MySQL result. Perhaps your MySQL schema is different to sqlites? — ta.speot.is, Apr 16 '12 at 09:42
@ta.speot.is: can you try adding `as` as in `avg(T2.score) as avg2` (two occurences)? — , Apr 16 '12 at 09:44
Doesn't work. Pretty sure SQL Server doesn't play dice when it comes to using aliases in `WHERE`, `GROUP BY` or `HAVING`. — ta.speot.is, Apr 16 '12 at 09:45
trying parts of the statement under sqlite I have this `SQL error: no such function: exists`. Keywords that are working under mysql may not in other databases systems. I would personally use the `IN` keyword for testing the results of your subquery — Bathz, Apr 16 '12 at 09:45
The documentation doesn't help. http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html - The SUM() and AVG() functions return a DECIMAL value for exact-value arguments (integer or DECIMAL). http://www.sqlite.org/lang_aggfunc.html - The result of avg() is always a floating point value ... even if all inputs are integers. And while we're at it: http://sqlfiddle.com/#!5/1123f/1 — ta.speot.is, Apr 16 '12 at 09:47
@ta.speot.is: yes I think mysql's resulte are indeed correct; and even sqlite seems to say that `avg` returns a `real`: `select typeof(avg(score)) from foo;` returns `real`. — , Apr 16 '12 at 09:51
It looks like you have discovered a genuine bug in SQLite (which does not surprise me a slightest bit). It is definitely not the precision, because it does not work even when you replace `avg` with `sum`, which does not require division. — Sergey Kalinichenko, Apr 16 '12 at 09:51
@sixfeetsix I run sqlite v.2.8.17 so yes maybe my advice does not help :) . However in my short dev life I never used the keyword `EXISTS` except for table creation and so on and when reading http://stackoverflow.com/questions/24929/difference-between-exists-and-in-in-sql I may think it is not the best keyword for what you are trying to do. — Bathz, Apr 16 '12 at 09:51
@Bathz: yes sqlite 2.X doesn't support that select statement (perhaps because `EXISTS` wasn't supported then...); in any case, I have nothing for or against `EXISTS`. — , Apr 16 '12 at 09:54
Ok, I took a look again with the two keywords versions, and indeed results are the same. Sqlite seems to incorrectly handle whether the subquery or the avg1 variable. Because when I hardcode the subquery result in a `IN` clause or the avg1 var in a `EXISTS` clause it returns the correct result... — Bathz, Apr 16 '12 at 11:18

score 1 · Accepted Answer · answered Apr 17 '12 at 12:36

I tried to mess with some variants of query.

It seems, like sqlite has errors in using of previous declared fields in a nested HAVING expressions.

In your example avg1 under second having is always equal to 5.0

Look:

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
    SELECT 1 AS col1 GROUP BY col1 HAVING avg1 = 5.0);

This one returns nothing, but execution of the following query returns both records:

...
having not exists (
    SELECT 1 AS col1 GROUP BY col1 HAVING avg1 <> 5.0);

I can not find any similar bug at sqlite tickets list.

Yes I'm seeing something very similar with when tracing with VDBE. I already sent sqlite-users@sqlite.org an email about the issue. — , Apr 17 '12 at 12:49

score 1 · Answer 2 · answered Apr 18 '12 at 06:32

Lets look at this two ways, i'll use postgres 9.0 as my reference database

(1)

-- select rows from foo 

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
-- where we don't have any rows from T2
having  not exists (
-- select rows from foo
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
-- where the average score for any row is greater than the average for 
-- any row in T1
having avg2 > avg1);

 id  |        avg1        
-----+--------------------
 106 | 4.5000000000000000
(1 row)

then let's move some of the logic inside the subquery, getting rid of the 'not' : (2)

-- select rows from foo 
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
-- where we do have rows from T2
having  exists (
-- select rows from foo
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
-- where the average score is less than or equal than the average for any row in T1
having avg2 <= avg1);
-- I think this expression will be true for all rows as we are in effect doing a
--cartesian join 
-- with the 'having' only we don't display the cartesian row set

 id  |        avg1        
-----+--------------------
 106 | 4.5000000000000000
 107 | 4.0000000000000000
(2 rows)

so you have got to ask yourself -- what do you actually mean when you do this correlated subquery inside a having clause, if it evaluates every row against every row from the primary query we are making a cartesian join and I don't think we should be pointing fingers at the SQL engine.

if you want every row that is less than the maximum average What you should be saying is:

select T1.id, avg(T1.score) avg1 
from foo T1 group by T1.id
having avg1 not in 
(select max(avg1) from (select id,avg(score) avg1 from foo group by id))

That select statement in my question is really not that difficult to understand; I simply want the correct result ;-) — , Apr 18 '12 at 09:57
I contend that it is ambiguous and will provide an indeterminate result across different technologies — Paddy Carroll, Apr 18 '12 at 10:21
It seems it's not ambiguous to MySql, MS Sql, and now Postgres; and it also seems that the main developer of sqlite is making changes in reaction to the email I sent to sqlite-users@sqlite.org. — , Apr 18 '12 at 10:25

ypercubeᵀᴹ · Answer 3 · 2012-04-17T14:15:40.450

0

Have you tried this version? :

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
    select T2.id, avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg(T2.score) > avg(T1.score));

Also this one (which should be giving same results):

select T1.*
from
  ( select id, avg(score) avg1
    from foo 
    group by id
  ) T1
where not exists (
    select T2.id, avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg(T2.score) > avg1);

The query can also be handled with derived tables, instead of subquery in HAVING clause:

select ta.id, ta.avg1
from 
  ( select id, avg(score) avg1
    from foo
    group by id
  ) ta
  JOIN
  ( select avg(score) avg1
    from foo 
    group by id
    order by avg1 DESC
    LIMIT 1
  ) tmp
  ON tmp.avg1 = ta.avg1

edited Apr 17 '12 at 14:15

answered Apr 17 '12 at 13:03

ypercubeᵀᴹ

113,259
19
174
235

I just did; same 2 records in sqlite, same 1 record in mysql. – Apr 17 '12 at 13:06
Yes your second option I have already served as an answer; please see the question I linked at the beginning of this question. – Apr 17 '12 at 13:08
As to why the original query doesn't work as expected, I suppose it's a bug, related to how subqueries are handled. – ypercubeᵀᴹ Apr 17 '12 at 13:16
@sixfeetsix: Does the `JOIN` version shows correct (1-row) result? – ypercubeᵀᴹ Apr 17 '12 at 13:53
@sixfeetsix: I've added one more version with derived subquery, turning `HAVING` into `WHERE`. – ypercubeᵀᴹ Apr 17 '12 at 14:16

Why does SELECT results differ between mysql and sqlite?

3 Answers3

Linked