PostgreSQL: Query doesn't work using count

Question

WITH hi AS (
  SELECT ps.id, ps.brgy_locat, ps.municipali, ps.bldg_name, fh.gridcode, ps.bldg_type
  FROM evidensapp_polystructures ps
  JOIN evidensapp_floodhazard fh ON fh.gridcode=3
                                 AND ST_Intersects(fh.geom, ps.geom)
), med AS (
  SELECT ps.id, ps.brgy_locat, ps.municipali ,ps.bldg_name, fh.gridcode, ps.bldg_type
  FROM evidensapp_polystructures ps
  JOIN evidensapp_floodhazard fh ON fh.gridcode=2
                                 AND ST_Intersects(fh.geom, ps.geom)
  EXCEPT SELECT * FROM hi
), low AS (
  SELECT ps.id, ps.brgy_locat, ps.municipali,ps.bldg_name, fh.gridcode, ps.bldg_type
  FROM evidensapp_polystructures ps
  JOIN evidensapp_floodhazard fh ON fh.gridcode=1
                                 AND ST_Intersects(fh.geom, ps.geom)
  EXCEPT SELECT * FROM hi
  EXCEPT SELECT * FROM med
)
SELECT brgy_locat, municipali, bldg_name,  bldg_type, gridcode, count( bldg_name)
FROM (SELECT brgy_locat, municipali, bldg_name, gridcode, bldg_type
      FROM hi
      GROUP BY 1, 2, 3, 4, 5) cnt_hi
FULL JOIN (SELECT brgy_locat, municipali,bldg_name, gridcode, bldg_type
      FROM med
      GROUP BY 1, 2, 3, 4, 5) cnt_med USING (brgy_locat, municipali, bldg_name,gridcode,bldg_type)
FULL JOIN (SELECT brgy_locat, municipali,bldg_name,gridcode, bldg_type
      FROM low
      GROUP BY 1, 2, 3, 4, 5) cnt_low USING (brgy_locat, municipali, bldg_name, gridcode, bldg_type)

The query above returns an error:

ERROR: column "cnt_hi.brgy_locat" must appear in the GROUP BY clause or be used in an aggregate function ********** Error **********

ERROR: column "cnt_hi.brgy_locat" must appear in the GROUP BY clause or be used in an aggregate function SQL state: 42803

But if I omit the count(bldg_name) it works. But I need to count based on bldg_name.

EDIT: I wanted to get the number of buildings that intersect with the hazard value(gridcode): High(3), Medium(2) and Low(1). But, if a certain geometry intersects already in High,exclude in it Medium query and same goes with Low exclude those geometry that intersects in High and Medium.

PostgreSQL: 9.4, PostGIS: 2.1.7

Table Details:

CREATE TABLE evidensapp_floodhazard (
  id integer NOT NULL DEFAULT nextval('evidensapp_floodhazard_id_seq'::regclass),
  gridcode integer NOT NULL,
  date_field character varying(60),
  geom geometry(MultiPolygon,32651),
  CONSTRAINT evidensapp_floodhazard_pkey PRIMARY KEY (id)
);

CREATE INDEX evidensapp_floodhazard_geom_id
  ON evidensapp_floodhazard USING gist (geom);

ALTER TABLE evidensapp_floodhazard CLUSTER ON evidensapp_floodhazard_geom_id;

CREATE TABLE evidensapp_polystructures (
  id serial NOT NULL,
  bldg_name character varying(100) NOT NULL,
  bldg_type character varying(50) NOT NULL,
  brgy_locat character varying(50) NOT NULL,
  municipali character varying(50) NOT NULL,
  province character varying(50) NOT NULL,
  geom geometry(MultiPolygon,32651),
  CONSTRAINT evidensapp_polystructures_pkey PRIMARY KEY (id)
);

CREATE INDEX evidensapp_polystructures_geom_id
  ON evidensapp_polystructures USING gist (geom);

ALTER TABLE evidensapp_polystructures CLUSTER ON evidensapp_polystructures_geom_id;

Intended output is like this but with correct count:

EDIT 2: As much as I try my best to explain what the intended output is, anyway:

count the bldg_name not the id in which what gridcode it intersects in floodhazard with the condition as mentioned above on the EDIT 1.
then group it to what brgy_locat,brgy_municipali and what gridcode and bldg_type it belong.

Kindly take a look at the image above.

`bldg_name`, e.g if that building intersect already in high hazard, exclude it in medium hazard, and with low, exclude those buildings already in high and medium hazard. This is to avoid counting the building twice or thrice. — Sachi Tekina, Aug 10 '15 at 10:07
Just the total count per hazard value? Please be more specific. We also need to know more about your table definition. Which columns are UNIQUE / NOT NULL in `evidensapp_polystructures`? The table definition provides that information (what you get with `\d evidensapp_polystructures` in psql) or a working test case. Why `FULL JOIN` in the final result? A test case with the desired result would clarify ... — Erwin Brandstetter, Aug 10 '15 at 10:13
Ok, I updated my question again, as I am new to database work. The answer is from the previous question I posted here. I just modified it to get my intended result but unfortunately, I couldn't solve it. — Sachi Tekina, Aug 10 '15 at 10:24
So you want `the number of buildings that intersect with the hazard value(gridcode)`, which would be the same for each row per CTE. But your picture shows different counts per row. What do you want to count *exactly*? Do you want a count per `evidensapp_floodhazard.id`? Either way, you cannot use `bldg_name` for the count, since that is not UNIQUE. — Erwin Brandstetter, Aug 10 '15 at 10:36
Yes, I want to count the buildings that intersect with each hazard value(gridcode;1,2,3) per `brgy_locat` and `municipality`. But with the condition given on the question. — Sachi Tekina, Aug 10 '15 at 11:09
So you want to count per `(gridcode, brgy_locat, municipality)`. This is the key information missing in the question. Can the same building intersect with *multiple* rows with the same `gridcode` in `evidensapp_floodhazard`? And do you want to count each building once or once per `evidensapp_floodhazard` in this case? — Erwin Brandstetter, Aug 10 '15 at 11:19
Ok, there's a possibility that the building will intersect multiple times. So, the building should be counted to the higher gridcode, eg. it intersects in both medium(gridcode;2) and low(gridcode;1), it should be counted on medium and not in low. — Sachi Tekina, Aug 10 '15 at 11:23
Can a building intersect with multiple rows of the ***same*** gridcode. And how to count in this case? (`gridcode` is not defined unique in `evidensapp_floodhazard`.) — Erwin Brandstetter, Aug 10 '15 at 11:26
I think that is not possible. But if it does, count it. The only thing that matters is if intersect on different gridcode, count it with the higher gridcode value and exclude it on the lower ones e.g. if it intersects in gridcode 3 already then exclude it in gridcode 2, same goes with gridcode 1. — Sachi Tekina, Aug 10 '15 at 11:34
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/86608/discussion-between-erwin-brandstetter-and-sachi-tekina). — Erwin Brandstetter, Aug 10 '15 at 12:36

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

You probably want this instead:

WITH hi AS (
   SELECT ps.brgy_locat, ps.municipali, ps.bldg_name, ps.bldg_type, fh.gridcode
        , count(*) OVER(PARTITION BY ps.bldg_name, ps.bldg_type) AS building_count
   FROM   evidensapp_polystructures ps
   JOIN   evidensapp_floodhazard    fh ON fh.gridcode = 3
                                      AND ST_Intersects(fh.geom, ps.geom)
   )
, med AS (
   SELECT ps.brgy_locat, ps.municipali, ps.bldg_name, ps.bldg_type, fh.gridcode
        , count(*) OVER(PARTITION BY ps.bldg_name, ps.bldg_type) AS building_count
   FROM   evidensapp_polystructures ps
   JOIN   evidensapp_floodhazard    fh ON fh.gridcode = 2
                                      AND ST_Intersects(fh.geom, ps.geom)
   LEFT   JOIN hi USING (bldg_name, bldg_type)
   WHERE  hi.bldg_name IS NULL
   )
TABLE hi

UNION ALL
TABLE med

UNION ALL 
   SELECT ps.brgy_locat, ps.municipali, ps.bldg_name, ps.bldg_type, fh.gridcode
        , count(*) OVER(PARTITION BY ps.bldg_name, ps.bldg_type) AS building_count
   FROM   evidensapp_polystructures ps
   JOIN   evidensapp_floodhazard    fh ON fh.gridcode = 1
                                      AND ST_Intersects(fh.geom, ps.geom)
   LEFT   JOIN hi USING (bldg_name, bldg_type)
   LEFT   JOIN med USING (bldg_name, bldg_type)
   WHERE  hi.bldg_name IS NULL
   AND    med.bldg_name IS NULL;

Based on your comments to the question and the chat, this counts per (bldg_name, bldg_type) now - excluding buildings that already intersect on a higher level - again based on (bldg_name, bldg_type).

All other columns are either distinct (id, geom) or functionally dependent noise for the count (brgy_locat, municipali, ...). If not, add more columns the PARTITION BY clause to disambiguate buildings. And add the same columns to the USING clause of the JOIN condition.

If a building intersects with multiple rows in evidensapp_floodhazard with the same gridcode it is counted that many times. See alternative blow.

Since you do not actually want to aggregate rows but just count on partitions, the key feature is using count() as window function, not as aggregate function like in your original. Basic explanation:

Best way to get result count before LIMIT was applied

count(*) does a better job here:

Using LEFT JOIN / IS NULL instead of EXCEPT. Details:

Select rows which are not present in other table

And I failed to see the purpose of FULL JOIN in the outer query. Using UNION ALL instead.

Aternative query

This counts building once, no matter how many times it intersects with evidensapp_floodhazard on the same gridcode level

Also, this variant (unlike the first!) assumes that all rows for the same (bldg_name, bldg_type) match on the same gridcode level, which may or may not be the case:

SELECT brgy_locat, municipali, bldg_name, bldg_type, 3 AS gridcode
     , count(*) OVER(PARTITION BY bldg_name, bldg_type) AS building_count
FROM   evidensapp_polystructures ps
WHERE  EXISTS (
   SELECT 1 FROM evidensapp_floodhazard fh
   WHERE  fh.gridcode = 3 AND ST_Intersects(fh.geom, ps.geom)
   )

UNION ALL
SELECT brgy_locat, municipali, bldg_name, bldg_type, 2 AS gridcode
     , count(*) OVER(PARTITION BY bldg_name, bldg_type) AS building_count
FROM   evidensapp_polystructures ps
WHERE  EXISTS (
   SELECT 1 FROM evidensapp_floodhazard fh
   WHERE  fh.gridcode = 2 AND ST_Intersects(fh.geom, ps.geom)
   )
AND    NOT EXISTS (
   SELECT 1 FROM evidensapp_floodhazard fh
   WHERE  fh.gridcode > 2  -- exclude matches on **all** higher gridcodes
   AND    ST_Intersects(fh.geom, ps.geom)
   )

UNION ALL 
SELECT brgy_locat, municipali, bldg_name, bldg_type, 1 AS gridcode
     , count(*) OVER(PARTITION BY bldg_name, bldg_type) AS building_count
FROM   evidensapp_polystructures ps
WHERE  EXISTS (
   SELECT 1 FROM evidensapp_floodhazard fh
   WHERE  fh.gridcode = 1 AND ST_Intersects(fh.geom, ps.geom)
   )
AND    NOT EXISTS (
   SELECT 1 FROM evidensapp_floodhazard fh
   WHERE  fh.gridcode > 1 AND ST_Intersects(fh.geom, ps.geom)
   );

Also demonstrating a variant without CTEs, which may or may not perform better, depending on data distribution.

Index

Adding gridcode to the index might improve performance. (Not tested with PostGis):

You need to install the additional module btree_gist for this first. Details:

Equivalent to exclusion constraint composed of integer and range

CREATE INDEX evidensapp_floodhazard_geom_id
  ON evidensapp_floodhazard USING gist (gridcode, geom);

It is supposed to count the `bldg_name` and group it by `brgy_locat` and `municipali`. — Sachi Tekina, Aug 10 '15 at 11:46
"count the bldb_name" does not define how to count exactly. Two *distinct* buildings can have the same `bldg_name`, because those are not defined UNIQUE in your table. Typically, you'll want to count distinct *buildings*, not distinct building *names*. I have been trying to work out an *exact* definition or your count in several comments. It's the (missing) key part of your question. — Erwin Brandstetter, Aug 10 '15 at 11:53
Ok, I guess my mistake. I just want to count the `bldg_name` what gridcode it intersects in `flood_hazard` and group it by `brgy_locat` ,`municipali`, `bldg_type`, and its `gridcode` with the conditions in the question. Actually, the answer of @user3462803 is almost correct but it does not exclude the buildings that intersects already in a higher gridcode. — Sachi Tekina, Aug 10 '15 at 11:59
@SachiTekina: You keep writing `count the bldg_name`, but that does not make sense. What is it *exactly* that shall be counted? Distinct names or distinct buildings? `count(bldg_name)` counts *how many rows have a non-null value in the column `bldg_name`*. It just so happens that this compensates for all the NULL values created by your `FULL JOIN`s - which themselves are most probably a misunderstanding to begin with ... You should start at the beginning (in your question) and describe what you *have* and *want*, independent of the "solution" you have so far. — Erwin Brandstetter, Aug 10 '15 at 12:19
distinct bldg_names. I updated again my question but I don't know if were on the same frequency here. Thanks for your time. — Sachi Tekina, Aug 10 '15 at 12:28

Rahul · Answer 2 · 2015-08-06T02:49:55.373

1

Error is asking you to include the select list columns in the GROUP BY clause; you can do like below

SELECT brgy_locat, municipali, bldg_name,  bldg_type, 
gridcode, building_count
FROM (SELECT brgy_locat, municipali, bldg_name, gridcode, bldg_type,
      count( bldg_name) as building_count
      FROM hi
      GROUP BY 1, 2, 3, 4, 5) cnt_hi
FULL JOIN (SELECT brgy_locat, municipali,bldg_name, gridcode, bldg_type
      FROM med
      GROUP BY 1, 2, 3, 4, 5) cnt_med 
USING (brgy_locat, municipali, bldg_name,gridcode,bldg_type)
FULL JOIN (SELECT brgy_locat, municipali,bldg_name,gridcode, bldg_type
      FROM low
      GROUP BY 1, 2, 3, 4, 5) cnt_low 
USING (brgy_locat, municipali, bldg_name, gridcode, bldg_type);

edited Aug 06 '15 at 02:49

answered Aug 06 '15 at 02:34

Rahul

76,197
13
71
125

That would return an error `column "cnt_med.brgy_locat" must appear in the GROUP BY clause or be used in an aggregate function` – Sachi Tekina Aug 06 '15 at 02:41
@SachiTekina, No, it will not since I am grouping the outer query. Try once and see. – Rahul Aug 06 '15 at 02:43
See the answer and add it the same way; it depends on the column list in `select` clause ... select clause columns must be present in `group by` as well. See edit in answer. – Rahul Aug 06 '15 at 02:48
Ok, the error has been solved but I wanted to count `bldg_name` also in `med` and `low` on the same column as in `hi`. – Sachi Tekina Aug 06 '15 at 02:54
Include `count( bldg_name)` in rest 2 part of query as well same way. – Rahul Aug 06 '15 at 02:56
That would make column reference "building_count" ambiguous. – Sachi Tekina Aug 06 '15 at 02:59

user3462803 · Answer 3 · 2015-08-10T09:57:37.503

I don't know if this will work for you since I don't have enough knowledge regarding postgresql. Also not sure if this will give you what you want. But, give this a try. You just need to include building_count on your using clause.

SELECT brgy_locat, municipali, bldg_name,  bldg_type, 
gridcode, building_count
FROM (SELECT brgy_locat, municipali, bldg_name, gridcode, bldg_type,
      count( bldg_name) as building_count
      FROM hi
      GROUP BY 1, 2, 3, 4, 5) cnt_hi
FULL JOIN (SELECT brgy_locat, municipali,bldg_name, gridcode, bldg_type,
      count(bldg_name) as building_count
      FROM med
      GROUP BY 1, 2, 3, 4, 5) cnt_med 
USING (brgy_locat, municipali, bldg_name,gridcode,bldg_type, building_count)
FULL JOIN (SELECT brgy_locat, municipali,bldg_name,gridcode, bldg_type,
      count(bldg_name) as building_count
      FROM low
      GROUP BY 1, 2, 3, 4, 5) cnt_low 
USING (brgy_locat, municipali, bldg_name, gridcode, bldg_type, building_count);

I'm not after the reputation..I just updated Rahul's answer. Hope it helps. Cheers! :)

Ok, thanks..although it returns no error the result is wrong, it doesn't exclude the `id` from `hi` in the `med` subquery and so with `low` — Sachi Tekina, Aug 10 '15 at 10:00
you don't want to include the id on the result? Is gridcode your id? — user3462803, Aug 10 '15 at 10:13
No, `gridcode` is the basis for excluding the buildings that intersects on it. — Sachi Tekina, Aug 10 '15 at 10:14

PostgreSQL: Query doesn't work using count

3 Answers3

Aternative query

Index

Linked