Returning the nth largest value in postgresql

Question

I want to return the nth largest salary in the table using for-loops, and I am still having trouble with the syntax. Please help.

create function get_nth_max(n integer)
returns real
as
$$
declare
nth_max real = max(salary) from company;
pay real;
begin
for pay in select salary from company 
    order by salary = desc limit n 
    loop
        if pay.salary < nth_max then
        nth_max = pay.salary and
end loop;
return nth_max
end;
$$
language plpgsql;

Error message:

ERROR:  syntax error at or near "desc"
LINE 10:  order by salary = desc limit n loop
                            ^
SQL state: 42601
Character: 182

`order by salary = desc` is invalid. Get rid of the `=` there. Also there is a dangling `AND` at the end of your `IF` statement, and the `end if` is missing. Please see [the manual](https://www.postgresql.org/docs/current/plpgsql.html) for the correct syntax of `IF` — , Feb 02 '22 at 07:00

score 2 · Answer 1 · answered Feb 02 '22 at 04:12

2

You don't need a UDF for this, just use DENSE_RANK:

WITH cte AS (
    SELECT *, DENSE_RANK() OVER (ORDER BY salary DESC) drnk
    FROM company
)

SELECT salary
FROM cte
WHERE drnk = <value of n here>

answered Feb 02 '22 at 04:12

Tim Biegeleisen

502,043
27
286
360

I just started learning postgresql a few days ago so I don't know WITH _ AS, DENSE_RANK(), and OVER. I'll try to follow your code by googling, but the main reason I used a for loop is to get used to basic syntax. I don't understand why the code won't work. Thanks! – TheLast Cipher Feb 02 '22 at 04:20
2

@TheLastCipher: see [window functions](https://www.postgresql.org/docs/current/tutorial-window.html) and [WITH queries](https://www.postgresql.org/docs/current/queries-with.html) in the Postgres manual rather than "googling around" – Feb 02 '22 at 06:59

score 2 · Accepted Answer · answered Feb 02 '22 at 22:06

First you need to define "the nth largest salary in the table" clearly.

What about duplicates? If two people earn 1000 and one earns 800, is 800 then the 2nd or 3rd highest salary? Can salary be NULL? If so, ignore NULL values? What if there are not enough rows?

Assuming ...

Duplicate entries only count once - so 800 is considered 2nd in my example.
salary can be NULL. Ignore NULL values.
Return NULL or nothing (no row) if there are not enough rows.

Proper solution

SELECT salary
FROM  (
   SELECT salary, dense_rank() OVER (ORDER BY salary DESC NULLS LAST) AS drnk
   FROM   company
   ) sub
WHERE  drnk = 8;

That's basically the same as Tim's answer, but NULLS LAST prevents NULL from coming first in descending order. Only needed if salary can be NULL, obviously, but never wrong. Best supported with an index using the same sort order DESC NULLS LAST. See:

Sort by column ASC, but NULL values first?

`FOR` loop as proof of concept

If you insist on using a FOR loop for training purposes, this would be the minimal correct form:

CREATE OR REPLACE FUNCTION get_nth_max(n integer, OUT nth_max numeric)
  LANGUAGE plpgsql AS
$func$
BEGIN
   FOR nth_max IN
      SELECT DISTINCT salary
      FROM   company
      ORDER  BY salary DESC NULLS LAST
      LIMIT  n
   LOOP
      -- do nothing
   END LOOP;
END
$func$;

Working with numeric instead of real, because we don't want to use a floating-point number for monetary values.

Your version does a lot of unnecessary work. You don't need the overall max, you don't need to check anything in the loop, just run through it, the last assignment will be the result. Still inefficient, but not as much.

Notably, SELECT get_nth_max(10) returns NULL if there are only 9 rows, while the above SQL query returns no row. A subtle difference that may be relevant. (You could devise a function with RETURNS SETOF numeric to return no row for no result ...)

Using an OUT parameter to shorten the syntax. See:

Returning from a function with OUT parameter

score 0 · Answer 3 · answered Feb 02 '22 at 07:05

So I have missing semicolons, did not end my if-statement, and a 'dangling' AND. The following code is the working version:

create or replace function get_nth_max(n integer)
returns real
as
$$
declare
nth_max real = max(salary) from company;
pay record;
begin
for pay in select salary from company 
    order by salary desc limit n 
    loop
        if pay.salary < nth_max then
        nth_max = pay.salary;
        end if;
    end loop;
return nth_max;
end;
$$
language plpgsql;

Returning the nth largest value in postgresql

3 Answers3

Proper solution

FOR loop as proof of concept

`FOR` loop as proof of concept