This question is about the functionality of first_value()
, using another function or workaround.
It is also about "little gain in performance" in big tables. To use eg. max()
in the explained context below, demands spurious comparisons. Even if fast, it imposes some additional cost.
This typical query
SELECT x, y, count(*) as n
FROM t
GROUP BY x, y;
needs to repeat all columns in GROUP BY
to return more than one column. A syntactic sugar to do this, is to use positional references:
SELECT x, y, count(*) as n
FROM t
GROUP BY x, 2 -- imagine that 2, 3, etc. are repeated with x
Sometimes needs not only sugar, but also some semantic to understand complex context:
SELECT x, COALESCE(y,z), count(*) as n
FROM t
GROUP BY x, y, z -- y and z are not "real need" grouping clauses?
I can imagine many other complex contexts. Let's see usual solutions:
SELECT x, max(y) as y, count(*) as n
FROM t
GROUP BY x -- best semantic! no need for other columns here
where max()
function can be any "sample()" (eg. first or last value). The performance of something that do nothing is better than max()
, e.g. the aggregate function first_value()
, but it needs a WINDOW
, so lost performance. There are some old suggestions to implement first/last agg functions in C.
Is there some "get any one value fast" aggregate function with better performance than max()
or GROUP BY X,2,...
?
Perhaps some new feature in a recent release?