Finding the count of duplicates between two values within a function

Question

I have this problem:

Create a function skillsInRange(n1 int, n2 int) that returns the count of Westerosis that have at least n1 skills and at most n2 skills.

With the relevant table:

INSERT INTO WesterosiSkill(wid, skill) VALUES
  (1001,'Archery'), (1001,'Politics'), (1002,'Archery'),
  (1002,'Politics'), (1004,'Politics'), (1004,'Archery'),
  (1005,'Politics'), (1005,'Archery'), (1005,'Swordsmanship'),
  (1006,'Archery'), (1006,'HorseRiding'), ...

It seems obvious that we only need to count the duplicate values in wid (first column). And then our second condition is only keeping the ones that have between n1 and n2 occurrences.

So I created this function:

CREATE FUNCTION skillsInRange (n1 int, n2 int)
RETURNS INTEGER AS
$$
SELECT COUNT(wid) AS wcount
FROM westerosiSkill
GROUP BY wid
HAVING wcount BETWEEN n1 AND n2
$$ LANGUAGE SQL;

And a few variants of it. None of which have worked. In this version, it tells me that wcount doesn't exist, even when I change HAVING to WHERE.

The SELECT COUNT ... GROUP BY returns the incorrect number of occurrences for each wid. And any of the HAVING clauses I've tried returns the incorrect number of occurrences between these two values, or in this instance, is syntactically incorrect due to the aggregate functions.

Erwin Brandstetter · Answer 1 · 2022-09-18T00:03:42.313

You need two levels of aggregation:

CREATE FUNCTION skillsinrange (n1 int, n2 int)
  RETURNS int
  LANGUAGE sql AS
$func$
SELECT count(*)::int
FROM  (
   SELECT count(*)::int AS wcount
   FROM   westerosiSkill
   GROUP  BY wid
   ) sub
WHERE wcount BETWEEN n1 AND n2;
$func$;

This is assuming that (wid, skill) is the PK, so both columns are defined NOT NULL, and the combination UNIQUE. Else you need to do more.

The function wrapper seems arbitrary. You might just use the plain query.

The cast to integer is just because you show a function signature with type integer for input and output. You might use bigint instead and ditch the casts.

For the error messages you saw, consider the manual:

An output column's name can be used to refer to the column's value in ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses; there you must write out the expression instead.

See:

PostgreSQL Where count condition

This is returning the wrong value when either n1 or n2 = 0. But the other non-zero cases appear correct. — zachs snachs, Sep 18 '22 at 01:12
A count of zero doesn't make sense, unless there is a separate "westerosis" table you neglected to mention ... — Erwin Brandstetter, Sep 18 '22 at 11:59

Finding the count of duplicates between two values within a function

1 Answers1