Which is the least expensive aggregate function in the absence of ANY()

Question

I usally use MAX() or MIN() if a DBMS hasn't an ANY() aggregate function.

Is there something less expensive in mySQL and MS-SQL?

Why would an ANY() aggregate function exist? What would its specification be? — Jonathan Leffler, May 19 '11 at 14:30
What do you mean with `ANY()`? It should return one random row? — ypercubeᵀᴹ, May 19 '11 at 14:31
`any(expresion)` : returns any value of expresion within the group. in order to have good performance with trivial dependant columns in cases where all are the same or just any is good enaugh — Saic Siquot, May 19 '11 at 14:35
No ANY aggregate in ANSI-92 SQL http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt — gbn, May 19 '11 at 14:37
SQL Server has `first()` and `last()` for "don't-care" fields in grouped queries. MySQL doesn't require you to do anything to get non-aggregated fields in grouped queries, just use the fieldname as you would in any other select. — Marc B, May 19 '11 at 14:58
@Mark B: I know them on Access but not on SQL server, I didn't know the MySQL way, I have just learned. Thanks — Saic Siquot, May 19 '11 at 15:06
@LuisSiquot - [See also my answer here](http://stackoverflow.com/questions/7753319/sql-server-pick-random-or-first-value-with-aggregation/7753492#7753492) — Martin Smith, Oct 17 '11 at 09:37

score 3 · Answer 1 · edited May 23 '17 at 12:24

3

There is no ANY aggregate in ANSI SQL-92

There is the ANY qualifier to match SOME and ALL

TSQL - SOME | ANY why are they same with different names? (SO)
ALL, ANY, and SOME: The Three Stooges (SQL Server Pedia)

MIN and MAX are proper aggregates... completely unrelated and shouldn't be compared

Edit:

Only MySQL has this ambiguity of an "ANY" aggregate: SQL Server, Sybase, Oracle, PostGres do not. See Do all columns in a SELECT list have to appear in a GROUP BY clause

edited May 23 '17 at 12:24

Community

1
1

answered May 19 '11 at 14:42

gbn

422,506
82
585
676

i have selected a wrong name to "my aggregate" function. but I mean a way to have less expensive alternative of max() or min() if it exists – Saic Siquot May 19 '11 at 15:01
great!! [link](http://stackoverflow.com/questions/5986127/do-all-columns-in-a-select-list-have-to-appear-in-a-group-by-clause) – Saic Siquot May 19 '11 at 15:13
1

[SQL Server does have an `ANY` aggregate](http://sqlblog.com/blogs/paul_white/archive/2011/07/02/undocumented-query-plans-the-any-aggregate.aspx) – Martin Smith Oct 17 '11 at 09:23
@gbn - I've no idea if there would be any benefit to rewriting a query to get it though. Might be some minor reduction in CPU time potentially as it doesn't have to do any comparisons? – Martin Smith Oct 17 '11 at 09:34
@Martin Smith: I'd say it is an internal optimisation to deal with such cases as this above. If usable directly, it'd cause confusion like the MySQL group by nonsense – gbn Oct 17 '11 at 10:07
This answer is no longer accurate as of 2015 - [see this update](https://stackoverflow.com/a/67836358/2738164) for MySQL 5.7+. – bsplosion Jun 04 '21 at 11:55

score 1 · Answer 2 · answered May 19 '11 at 14:30

1

MIN and MAX are equally (in)expensive.

answered May 19 '11 at 14:30

Jonathan Leffler

730,956
141
904
1,278

1

i know, but is there other less expensive? – Saic Siquot May 19 '11 at 14:31
@Luis - If the field is indexed, no. `MIN` and `MAX` on an indexed field just go to the first or last entry in the index. – JNK May 19 '11 at 14:32
@JNK: but MIN() or MAX() doesn't? – CristiC May 19 '11 at 14:39
@Parkyprg: No, `MIN` and `MAX` will use the index (if there is one). – ypercubeᵀᴹ May 19 '11 at 14:42
@JNK - That is only true if you are getting a single `MAX` rather than using `GROUP BY`. If you are doing a `GROUP BY` even if an index exists on `group_column,aggregated_column` it won't skip to the next group. [so for some cardinalities a recursive CTE can be better](http://stackoverflow.com/questions/7753319/sql-server-pick-random-or-first-value-with-aggregation/7753492#7753492) – Martin Smith Oct 17 '11 at 09:53
@MartinSmith - If I have to do a lot of aggregation, I normally do an index on `(GroupField) INCLUDE (Aggregates)`, but yes this could be different based on aggregation – JNK Oct 17 '11 at 12:17

Johan · Accepted Answer · 2011-05-19T15:24:05.970

1

MySQL does not need an ANY() aggregate.

if I do a

SELECT field1, field2, SUM(field3) FROM table1 
GROUP BY field2

Microsofts T-SQL will complain but

MySQL will just silently execute

SELECT whatever(field1), field2, SUM(....

Which of course is way faster than SELECT max(field1), field2, SUM(.... GROUP BY field2

MySQL supports ANY, but SELECT ANY(field1) ... doesn't work because ANY is an operator similar to IN and ALL.
see: http://dev.mysql.com/doc/refman/5.0/en/any-in-some-subqueries.html

I love MySQL

edited May 19 '11 at 15:24

answered May 19 '11 at 14:42

Johan

74,508
24
191
319

10

Ambiguious and not explicit = dangerous. – gbn May 19 '11 at 14:43
3

@gbn, many tools give you a gun to shoot with, that's why you need to know your tool, so you don't shoot yourself in the foot. Whether guns kill people or people kill people is another discussion entirely. – Johan May 19 '11 at 14:45
@Johan Great!! I have seen SELECT field1, field2, SUM(field3) FROM table1 GROUP BY field2, field1 ... but field 1 is reudunat as it is field2 dependant. I think it is worst than min(field1) also – Saic Siquot May 19 '11 at 14:48
4

Poor analogy: not comparable. In this situation I see no need to have the RDBMS guess a column on my behalf. *Only* MySQL has this ambiguity too: SQL Server, Sybase, Oracle, PostGres **do not**. See http://stackoverflow.com/q/5986127/27535 – gbn May 19 '11 at 14:48
1

@gbn, personally I think the any() keyword is the better solution, looking at the docs MySQL does support it, but because sooo many MySQL queries depend on the old `dangerous` behavior it's not going to go away. The whole idiology of MySQL is to be as forgiving as possible, for some things this makes it the **right tool for the job** TM, for other jobs not. Know your tools and choose wisely. – Johan May 19 '11 at 14:54
5

What a fantastic coinage, and very appropriate: *idiology*, as in idio[tic] [ideo]logy, as in "the idiology of MySQL". I'll remember that one. (The correct spelling is *ideology*, btw.) – LukeH May 19 '11 at 15:20
5

@Johan - To carry your firearms analogy to it's conclusion - Quality guns (all other RDBMS) give you a safety switch to prevent accidental firing. MYSQL gives you a hair trigger and overloaded ammunition to make it much easier to hurt yourself. – JNK May 19 '11 at 15:36
1

Strikes me as error prone. In DBs that handle windowing, you can get this behavior in a more obvious manner. – Andrew Lazarus Jun 10 '11 at 04:29
@JNK, @Johan, @Andrew Lazarus: An example of why this MySQL behaviour is stupid http://stackoverflow.com/q/6642241/27535 – gbn Jul 10 '11 at 16:40
@gbn - I didn't require any convincing, but thanks for the link. – JNK Jul 10 '11 at 17:07
@JNK: I know from above, but some ammunition in case of bunfight... ;-) – gbn Jul 10 '11 at 17:21
1

As a note, this is no longer the case with default connection settings, particularly under MySQL Workbench - this is controlled by `sql_mode=only_full_group_by`. Issuing such a query will show the following: `Error Code: 1055. Expression #NUMBER of SELECT list is not in GROUP BY clause and contains nonaggregated column 'TABLE.COLUMN' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by` – bsplosion Jun 04 '21 at 11:10

score 1 · Answer 4 · answered Jun 04 '21 at 11:22

As of MySQL 5.7 (released October 2015), there is actually a function for this! any_value(col) explicitly meets this need - see the documentation for details.

However, it's important to note that it still appears to not guarantee a short-circuit and may still perform a full scan, so the goal of higher efficiency may not be met. If MySQL is intelligent about it, there is some chance, though testing would be wise. There are two circumstances in which it may be helpful anyway:

If the value to be aggregated is particularly large and unindexed, such that any comparisons would be expensive.
If the intention in the code would be made less clear by leveraging max or another placeholder aggregate. For example, a max may imply there was some reason for its existence rather than getting "anything" from the grouping, but any_value would explicitly define the intention without additional commenting.

Toy sample:

select
col, 
any_value(a.val) as any_val
from (
    select 'blue' as col, 3 as val
    union all
    select 'blue' as col, 2 as val
    union all
    select 'green' as col, 1 as val
) as a
group by col

score 0 · Answer 5 · answered Mar 15 '15 at 19:31

Fine, let's rephrase the original question. Instead of ANY, which is ambiguous and not SQL standard, it would be great if EXISTS-aggregate function was provided.

On several occasions I have used "count(outer.column) > 0" to imitate the "exists(outer.column)", which would be arguably cheaper since counting is discarded anyway.

Which is the least expensive aggregate function in the absence of ANY()

5 Answers5

Linked