Can MySQL be convinced of functional dependency when HAVING COUNT(*) = 1?

Question

I'm trying to find orders with only one item in a database running on MySQL 5.7.23 on Ubuntu 18.04 LTS. But somehow MySQL can't infer that COUNT(*) = 1 implies a functional dependence.

The following 2-table database of orders with order items illustrates the failure:

DROP TABLE IF EXISTS t_o, t_oi;
CREATE TABLE t_o (
  order_id INTEGER UNSIGNED PRIMARY KEY,
  placed_on DATE NOT NULL,
  INDEX (placed_on)
);
INSERT INTO t_o (order_id, placed_on) VALUES
(1, '2018-10-01'),
(2, '2018-10-02');
CREATE TABLE t_oi (
  item_id INTEGER UNSIGNED PRIMARY KEY AUTO_INCREMENT,
  order_id INTEGER UNSIGNED NOT NULL,
  sku VARCHAR(31) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL,
  qty INTEGER UNSIGNED NOT NULL,
  unit_price INTEGER UNSIGNED NOT NULL,
  INDEX (sku),
  FOREIGN KEY (order_id) REFERENCES t_o (order_id)
    ON DELETE CASCADE ON UPDATE CASCADE
);
INSERT INTO t_oi (order_id, sku, qty, unit_price) VALUES
(1, 'SO', 1, 599),
(1, 'SF', 2, 399),
(2, 'SU', 1, 399);

SELECT t_oi.order_id, t_o.placed_on, t_oi.sku, t_oi.qty, t_oi.unit_price
FROM t_o
INNER JOIN t_oi ON t_o.order_id = t_oi.order_id
GROUP BY t_oi.order_id
HAVING COUNT(*) = 1

I expect this to return (2, '2018-10-02', 'SU', 1, 399) because it is the only order with only one item. I don't want any rows where order_id = 1 because that order has more than one item. But instead, MySQL gives the following error:

#1055 - Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'phs_apps.t_oi.sku' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

The manual explains "functionally dependent". But is there a way to express this functional dependence to MySQL that's cleaner than slinging MIN() around each output column for which MySQL complains? If at all possible, I'd prefer a solution that does not involve joining to t_oi twice, once to find relevant t_o.order_id values and once to append the details of each such order's sole item, as including a table twice in a single query is incompatible with use of TEMPORARY TABLE because of a 13-year-old "Can't reopen table" bug.

If those weren't temporary tables, you could simply rewrite the query (and would likely do it without a `group by` anyway). So I guess being required to use `min` (or, "cleaner", `any_value`) on most columns here is more or less a consequence of/workaround for the problem with temporary tables, not with the logic of the query itself. Since those are temporary tables, you may be able to add additional columns with e.g. the total numbers of items or something. — Solarflare, Oct 17 '18 at 06:26
What if you group by `GROUP BY t_o.order_id` (not t_oi)? Also change to `t_o.order_id` in select clause. — DanB, Oct 17 '18 at 17:53

score 2 · Answer 1 · answered Oct 16 '18 at 20:40

No, I don't think it's possible to convince MySQL to recognize the functional dependency with the special condition in the HAVING clause.

The HAVING clause gets evaluated much later in the query execution, after the rows have been accessed, after the GROUP BY operation, after the aggregates, etc.

We could remove ONLY_FULL_GROUP_BY from sql_mode. That would allow MySQL to process the query without throwing the error. But that's just going old school with a MySQL-specific non-standard extension to GROUP BY behavior. That doesn't mean that MySQL is convinced of functional dependency.

philipxy · Answer 2 · 2018-10-29T06:04:29.480

2

You can use function ANY_VALUE():

MySQL 8.0 Reference Manual / Functions and Operators / Miscellaneous Functions
12.22 Miscellaneous Functions

ANY_VALUE(arg)

This function is useful for GROUP BY queries when the ONLY_FULL_GROUP_BY SQL mode is enabled, for cases when MySQL rejects a query that you know is valid for reasons that MySQL cannot determine. The function return value and type are the same as the return value and type of its argument, but the function result is not checked for the ONLY_FULL_GROUP_BY SQL mode.

Or just take MIN() of each non-grouped column. Comment it. There will always be cases the DBMS can't or couldn't prove statically for given literals & functions or at runtime. So you need a solution like MIN() in your toolbox. You have to have some query/code rearrangement since there's no way to give the DMBS a proof or override. Although you could consider clearing ONLY_FULL_GROUP_BY to be that override. But wouldn't you have to comment clearing & restoring that too because it's not obvious?

You could assign the subquery to a table with an appropriate PK (primary key) or UNIQUE NOT NULL constraint. But you'd still want to comment why. Since the DBMS doesn't know about the FD (functional dependency) we can expect the assignment to not be optimized either. We can expect minimal overhead from something like MIN().

Indeed that manual section goes on to say:

There are multiple ways to cause MySQL to accept the query:

Alter the table to make [the functionally dependent column] a primary key or a unique NOT NULL column. [...]

Use ANY_VALUE() [...]

Disable ONLY_FULL_GROUP_BY. [...]

edited Oct 29 '18 at 06:04

answered Oct 16 '18 at 23:29

philipxy

14,867
6
39
83

using any_value simply bypasses ONLY_FULL_GROUP_BY it does not enforce `functional dependence` – Paul Maxwell Oct 29 '18 at 06:32
the question is `Can MySQL be convinced of functional dependency when HAVING COUNT(*) = 1` it cannot because `functional dependency` is tested by the group by clause. – Paul Maxwell Oct 29 '18 at 06:38
Really not sure why your interpretation of the question has to be the correct one. You are proposing (in comments) that SQL (not just MySQL) "should" take the having clause into account when evaluating the select clause. I disagree. Is it OK to leave it at that disagreement? – Paul Maxwell Oct 29 '18 at 06:54
You don't seem to realize that functional dependency is a relational database theory notion that the question & SQL happen to use. And I am saying that SQL *could* take it into account more than it does. OK, done. – philipxy Oct 29 '18 at 07:04
Teradata allows column aliases to be used in where clauses, and this is non-standard. So, yes you could build non-standard approaches for this problem in a similar vein, but in truth **I just don't see the point**. It is actually easier to repeat the non-aggregating columns of the select clause into the group by clause than it is to apply these workarounds. The very reason for this small debate arises from the non-standard group by rules MySQL initiated which has led to misconceptions by many. (*ps: I do actually know about the theoretical notion. Which is why I quoted it.*) – Paul Maxwell Oct 29 '18 at 07:17
Again: It has nothing to do with MySQL nonstandard group by, which the question & my comments make clear, and you don't seem to understand the issue. After having count(*)=1 the input to the select phase has one row per group. The columns that are single-valued per group only after that having can't be included in the group by because they are not single-valued before. SQL with & without functionality T301 disallows selecting certain columns naked even though they're single-valued per group at the select. So just use MIN()! But the asker is asking about writing an equivalent query without it. – philipxy Oct 29 '18 at 07:41
`somehow MySQL can't infer that COUNT(*) = 1 implies a functional dependence` I simply claim that functional dependence is determined before that filter can be applied. Hence MySQL cannot *infer* functional dependence via the having clause. Could we simply agree that we disagree on the question interpretation please. – Paul Maxwell Oct 29 '18 at 08:11
Clearly SQL *could* be defined to allow all columns naked after the special case `having ... and count(*)=1` & clearly it is reasonable to describe that as inferring that certain FDs hold. So clearly we disagree. – philipxy Oct 29 '18 at 08:22

DanB · Answer 3 · 2018-10-16T19:14:46.050

0

On that query "SELECT t_oi.order_id, t_o.placed_on, t_oi.sku, t_oi.qty, t_oi.unit_price", you are grouping on first column. You have to tell what to do with others columns. You can do a group_concat on sku column, or take the first entries on t_oi table with ranking function, so no group by will be needed anymore.

Try this, with ranking. Not sure, not tested.

SELECT t_o.order_id, t_o.placed_on, t_oi2.sku, t_oi2.qty, t_oi2.unit_price
FROM t_o
INNER JOIN (
    select t_oi.order_id, t_o.placed_on, t_oi.sku, t_oi.qty, t_oi.unit_price,
    @rank := case when @cur_order_id = t_oi.order_id then @rank + 1 else 1 end,
    @cur_order_id := t_oi.order_id
    from t_oi, (select @cur_order_id := 0, @rank := 0) tmp
    order by t_oi.order_id
    ) t_oi2 ON t_o.order_id = t_oi2.order_id and t_oi2.rnk = 1;

edited Oct 16 '18 at 19:14

answered Oct 16 '18 at 18:29

DanB

2,022
1
12
24

Normally, MySQL is supposed to detect which other columns have a functional dependence on the first column, and that dependence "tell[s] what to do with others columns" when no aggregate function is given. But it's failing to detect this particular dependence. – Damian Yerrick Oct 16 '18 at 18:40
How does one "take the first entries on t_oi table with ranking function" in MySQL 5.7? [MySQL 8 introduces ranking functions](https://mysqlserverteam.com/mysql-8-0-2-introducing-window-functions/), but I mentioned MySQL 5.7 because the repositories of the servers for which I'm coding do not yet ship MySQL 8. Does it involve `@var :=`? – Damian Yerrick Oct 16 '18 at 18:42
Yes, you can do with @var. See this post. https://stackoverflow.com/questions/3333665/rank-function-in-mysql I did many times in mysql 5.6. Not easy to read but it works well. – DanB Oct 16 '18 at 18:55
The code in the "Try this" edit produced `#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '@cur_order_id := 0, @rank := 0) t_oi2 ON t_o.order_id = t_oi2.order_id and t_oi2' at line 7`. – Damian Yerrick Oct 16 '18 at 19:10
Sorry! And now? – DanB Oct 16 '18 at 19:15
1

@DanielBlais Reading & writing the same variable in a select statement is undefined behaviour, see the manual. – philipxy Oct 16 '18 at 23:16
@philipxy, Can you tell me more? I used this many times, and see many post with this trick, it always works. – DanB Oct 17 '18 at 01:15
1

Testing cannot show "always works", it can only show "always worked". Read the manual re variables. Google my comments like my last comment. You'll find links to the manual & to a bug report re poor wording in the manual. People at Percona have shown by examining the code that *certain use of case expressions in select clauses* are safe *in a particular implementation version*. (But I've lost my link to the Percona post.) Everybody else is the blind leading the blind. – philipxy Oct 17 '18 at 01:29
1

I just noticed [the 8.0 manual](https://dev.mysql.com/doc/refman/8.0/en/user-variables.html) has different but still poor language re this. But it also says "to assign a value to a user variable in statements other than SET" "is subject to removal in a future release". – philipxy Oct 17 '18 at 02:45
I do not think that this method can get the result that Damian wants. Your query will rank every item per order, and at row 1, you do not know yet if there will be another row for this order. But this is exactly what the filter would need: throw away orders that have a second row (including row 1), not just only take 1 row per order; this would require a second pass (and 2nd access to the temporary table) too. – Solarflare Oct 17 '18 at 06:34
@philipxy, Can't find anything on this. It will be removed, ok, but this doesn't mean to not use this now. In MySQL 8, ranking function is really helpful, but Damian use 5.7. But Solarflare is right, I missread the question and I'm not answering it right. My query won't show expected result. – DanB Oct 17 '18 at 17:52
1

The 5.7 manual tells you not to set & read the same variable in the same select statement. It also says some unclear stuff about setting & reading, but it's unclear so it's useless. My point re its removal per 8.0 is that what allegedly works has no clear limits let alone guarantees. People irrationally promote their & others' guesses & hopes to beliefs. "the blind leading the blind". Safe was looping with cursors or stored procedures. Now there are analytic functions. – philipxy Oct 17 '18 at 21:39

score 0 · Answer 4 · edited Jun 20 '20 at 09:12

I believe your assumption about functional dependence to be wrong.

If R is a relation with attributes X and Y, a functional dependency between the attributes is represented as X->Y, which specifies Y is functionally dependent on X. Here X is a determinant set and Y is a dependent attribute. Each value of X is associated with precisely one Y value. techopedia

These 2 columns are functionally dependent (and the query operates). nb: Each value of t_o.placed_on is associated with precisely onet_oi.order_id value

SELECT t_oi.order_id, t_o.placed_on
FROM t_o
INNER JOIN t_oi ON t_o.order_id = t_oi.order_id
GROUP BY t_oi.order_id
HAVING COUNT(*) = 1

These are NOT functionally dependent (and the query will not work unless you remove ONLY_FULL_GROUP_BY)

SELECT t_oi.order_id, t_o.placed_on, t_oi.sku, t_oi.qty, t_oi.unit_price
FROM t_o
INNER JOIN t_oi ON t_o.order_id = t_oi.order_id
GROUP BY t_oi.order_id
HAVING COUNT(*) =

Any of these t_oi.sku, t_oi.qty, t_oi.unit_price columns can hold any valid value for their data types. So they are NOT pre-determined by the relationship involved in the query.

select @@sql_mode;

| @@sql_mode                                                                                                            |
| :-------------------------------------------------------------------------------------------------------------------- |
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION |

/* functionally dependent columns only */
SELECT t_oi.order_id, t_o.placed_on
FROM t_o
INNER JOIN t_oi ON t_o.order_id = t_oi.order_id
GROUP BY t_oi.order_id
HAVING COUNT(*) = 1

order_id | placed_on 
-------: | :---------
       2 | 2018-10-02

/* any columns some not functionally dependent */
SELECT t_oi.order_id, t_o.placed_on, t_oi.sku, t_oi.qty, t_oi.unit_price
FROM t_o
INNER JOIN t_oi ON t_o.order_id = t_oi.order_id
GROUP BY t_oi.order_id
HAVING COUNT(*) = 1

Expression #3 of SELECT list is not in GROUP BY clause and 
contains nonaggregated column 'fiddle_YRLHCAMPBMVSWYXFQGUD.t_oi.sku' 
which is not functionally dependent on columns in GROUP BY clause; 
this is incompatible with sql_mode=only_full_group_by

SET sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION'

✓

select @@sql_mode

| @@sql_mode                                                                                         |
| :------------------------------------------------------------------------------------------------- |
| STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION |

/* any columns some not functionally dependent */
SELECT t_oi.order_id, t_o.placed_on, t_oi.sku, t_oi.qty, t_oi.unit_price
FROM t_o
INNER JOIN t_oi ON t_o.order_id = t_oi.order_id
GROUP BY t_oi.order_id
HAVING COUNT(*) = 1

order_id | placed_on  | sku | qty | unit_price
-------: | :--------- | :-- | --: | ---------:
       2 | 2018-10-02 | SU  |   1 |        399

db<>fiddle here

But the point is that *in the groups having count(\*)=1* there is only *one* value in each of those columns because there is only one row in the group. The query does make sense. The DBMS *could* add such groups to its FD inference, but it doesn't happen to. — philipxy, Oct 29 '18 at 06:27
@philipxy You cannot reach the having clause until you pass the group by clause, so the group by clause is what tests functional dependence. The having clause does not need to do so. — Paul Maxwell, Oct 29 '18 at 06:30
@philipxy there is but one meaning of `functional dependence`. MySQL has played a poor role in allowing lax grouping syntax. The rules of functional dependence are tight, but the op is hoping a lax query will bypass those rules. It won't. — Paul Maxwell, Oct 29 '18 at 06:37
The question isn't about group by. It is about allowing the select. The input or which is determined after the rest of the query. The rule for what can appear in a select could be per group by & per having. It just isn't. — philipxy, Oct 29 '18 at 06:37
A column is functionally dependent on a set of others when a subrow value for the set always appears with the same value in the column. The rules for select when group by is present *use* the *notion* of functional dependency. It is not defined as something to do with group by. A DBMS could allow selects of columns that are functionally dependent on grouping columns in the standared SQL input to the select phase. That is what the question is asking about. It has *nothing to do* with MySQL's non-standard mode except one could use that mode in a workaround. It is about standard SQL queries. — philipxy, Oct 29 '18 at 06:51

Can MySQL be convinced of functional dependency when HAVING COUNT(*) = 1?

4 Answers4