0

For some reason MySQL does not accept the following query:

SELECT `tasks`.* FROM `tasks` INNER JOIN `task_status_updates` ON `task_status_updates`.`task_id` = `tasks`.`id` GROUP BY task_status_updates.task_id HAVING `task_status_updates`.`status` = 0

Running the exact same query in SQLite, however, produces the expected results (A list of tasks whose last task_status_update has the status 0). This error just happened on the production environment of Rails, because of the difference in SQLite and MySQL.

MySQL throws the following error:

ActiveRecord::StatementInvalid: Mysql2::Error: Unknown column 'task_status_updates.status' in 'having clause'

Could someone explain why above statement is invalid in MySQL, and how the wanted result can be obtained in a way that MySQL understands?

Qqwy
  • 5,214
  • 5
  • 42
  • 83
  • And have you considered that in mysql db you don't have this column in this table? – sagi Feb 24 '16 at 15:33
  • 3
    Have you checked whether [ONLY_FULL_GROUP_BY](http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by) is disabled on your server or not? – Giorgos Betsos Feb 24 '16 at 15:33
  • @sagi: Simpler queries like SELECT `task_status_updates`.* FROM `task_status_updates` GROUP BY `task_status_updates`.`task_id` HAVING `task_status_updates`.`status` IN (0, 1) work without problems. – Qqwy Feb 24 '16 at 15:35
  • @GiorgosBetsos wouldn't it say another error like cannot reffer to an nonaggregate column in the having clause? – sagi Feb 24 '16 at 15:35
  • This query is a tricky one from the standard view point. See this question for details: http://stackoverflow.com/questions/7594865/why-does-mysql-add-a-feature-that-conflicts-with-sql-standards – Vladislav Rastrusny Feb 24 '16 at 15:35
  • @sagi Hmm, yes, you have right. – Giorgos Betsos Feb 24 '16 at 15:37
  • GROUP BY serves no useful purpose here, and what's wrong with WHERE? – Strawberry Feb 24 '16 at 15:37
  • 2
    The query doesn't seem to make sense. When grouping by `task_id` and accessing `status` later it can only be one of all the task's statuses arbitrarily chosen. So it's rather a matter of luck whether you show a task in your results or not. So your real question should be: what do you want to achieve actually and what would be the appropriate query for this? (And how did the query shown make its way into the production environment?) – Thorsten Kettner Feb 24 '16 at 15:52
  • @ThorstenKettner Thank you, that is very helpful advice. I am amazed by the fact that it did work (by luck, as it seems) in SQLite. I will try to find another query. – Qqwy Feb 24 '16 at 16:12

1 Answers1

1

On no SQL engines will this SQL produce the list of tasks for which the last task status update has status 0. Instead, it will produce a list of tasks for which any task status update has status 0.

Furthermore, this is a profoundly non-standard use of GROUP BY. Both SQlite and MySQL happen to support the use of non-aggregated columns in the result set but standard SQL doesn't. Want you probably want is DISTINCT rather than GROUP BY but that won't help your original problem which is that any status 0 will be included in the result set, not just the most recent status update.

The reason that HAVING doesn't work in MySQL is that your result set includes only columns from tasks, there is no task_status_updates.tasks field in the result set against which to apply the HAVING filter. The real question is why it works in SQlite; that I can't say.

Larry Lustig
  • 49,320
  • 14
  • 110
  • 160
  • Hmm, in that case I have misunderstood how the GROUP BY and HAVING works. When we introduce an ORDER BY (task_status_updates.created_at), I expect that 1) the Order happens before the Group By. 2) The Group By therefore picks the newest task_staus_update, regardless of status. 3) Having filters(rejects) all returned rows with the incorrect status 4) Because of the INNER JOIN, only tasks whose newest task_status_update passes the test remain. – Qqwy Feb 24 '16 at 15:50
  • SQLite just picks a random row in the group (unless some other column uses MIN or MAX). – CL. Feb 24 '16 at 17:55
  • I have decided to accept this answer, as it perfectly describes that **there is no answer**. Sorry for waiting so long before accepting. – Qqwy Apr 11 '16 at 09:43
  • 1
    @Qqwy Tables have no order. Query "result sets" (which aren't sets and aren't tables) have order. In subqueries ORDER BY has no meaning except to affect LIMIT. ORDER BY is done almost last, before LIMIT. (It can use columns added in a SELECT clause). – philipxy May 31 '17 at 22:31