Multiple joins and aggregate functions producing incorrect totals

Question

I have 4 tables.

LAYERS 

id(pk)        location
-----------------------
  1            Bristol
  2            London

FEATURES

fid(pk)      layer_id(fk)     category
-----------------------------------------
  1.001        1                  Rank1
  1.002        1                  Rank2
  1.003        1                  Rank1
  2.001        2                  Rank3
  2.002        2                  Rank1

TOWN_STATUS 

id(pk)     town_id(fk)      fid(fk)      status
--------------------------------------------------
  1            5             1.001       AT_RISK
  2            6             1.001       DANGER
  3            19            2.002       DANGER
  4            23            2.002       DANGER
  5            24            2.002       AT_RISK
  6            30            2.002       DANGER

LANDMARK_STATUS 

id(pk)     landmark_id(fk)      fid(fk)      status
--------------------------------------------------
  1            1             1.002       DANGER
  2            10            1.002       DANGER
  3            11            2.002       DANGER
  4            13            2.002       AT_RISK

Each record in the FEATURES table is links to a record in the LAYERS table.

A spatial trigger operates on the features table to insert records into the TOWN_STATUS and LANDMARK_STATUS tables.

The status columns in the TOWN_STATUS and LANDMARK_STATUS tables can only have one of 2 values e.g.'AT RISK' or 'DANGER'

I would like to create a table which counts the number of features in a category and also how many 'AT RISK'or 'DANGER'statuses that category caused and filtered by a particular layer. So for example the table would look like this:

    category   feature_num    at_risk_num   danger_num
--------------------------------------------------------
    Rank1         3                 1         5
    Rank2         1                 0         3
    Rank3         1                 0         0

Currently I have an (Postgre)SQL query which looks like this

SELECT feature.category,
    count(feature.category) AS count,
    sum(
        CASE
            WHEN town_status.status = 'AT_RISK' OR landmark_status.status 'AT_RISK' THEN 1
            ELSE NULL
        END) AS at_risk_num,
    sum(
        CASE
            WHEN town_status.status::text = 'DANGER' OR landmark_status.status = 'DANGER' THEN 1
            ELSE NULL
        END) AS danger_num
   FROM features
     LEFT JOIN layer ON layer.id = feature.layer_id
     LEFT JOIN town_status ON town_status.fid = feature.fid
     LEFT JOIN landmark_status ON landmark_status.fid = feature.fid
  WHERE layer.location = 'Bristol'
  GROUP BY features.category
  ORDER BY features.category;

However when I do this , the count totals are incorrect for the feature_num column. I know it has something to do with the joins because of this answer but I cannot figure out how to form the last two columns based on the category of the features.

This seems likely to be a common error where people want some joins, each possibly involving a different key, of some subqueries, each possibly involving join and/or aggregation, but they erroneously try to do all the joining then all the aggregating or to aggregate over previous aggregations. PS When you get a result that you don't expect/understand, stop trying to find your overall goal & find your misunderstanding is.--Isolate the first wrong subexpression & its input & output & learn what misconception, typo, wrong reasoning, etc led to it. (Debugging fundamental.) Ask about that. — philipxy, Feb 26 '20 at 17:53
Please in code questions give a [mre]--cut & paste & runnable code; example input (as initialization code) with desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For errors that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) [ask] For SQL that includes DBMS & DDL, which includes constraints & indexes & tabular initialization. — philipxy, Feb 26 '20 at 17:54

score 1 · Accepted Answer · answered Feb 26 '20 at 17:10

Aggregate before joining:

SELECT f.category,
       COALESCE(ts.cnt, 0) + (COALESCE(ls.cnt, 0) as count,
       COALESCE(ts.at_risk, 0) + (COALESCE(ls.at_risk, 0) as at_risk,
       COALESCE(ts.danger, 0) + (COALESCE(ls.danger, 0) as danger
FROM features f LEFT JOIN
     layer l
     ON l.id = f.layer_id LEFT JOIN
     (SELECT ts.fid, COUNT(*) as cnt,
             COUNT(*) FILTER (WHERE ts.status = 'AT_RISK') as at_risk,
             COUNT(*) FILTER (WHERE ts.status = 'DANGER') as danger
      FROM town_status ts
      GROUP BY ts.fid
     ) ts
     ON town_status.fid = f.fid
     (SELECT ls.fid, COUNT(*) as cnt,
             COUNT(*) FILTER (WHERE ls.status = 'AT_RISK') as at_risk,
             COUNT(*) FILTER (WHERE ls.status = 'DANGER') as danger
      FROM landmark_status ls
      GROUP BY ls.fid
     ) ls
     ON ls.fid = f.fid;
WHERE l.location = 'Bristol'
GROUP BY f.category
ORDER BY f.category;

thank you! I have to modify it slightly for my version of Postgres but i think i've got the concept :) — equatorial_daydreamer, Feb 27 '20 at 17:00

Multiple joins and aggregate functions producing incorrect totals

1 Answers1