1

I am writing a query to summarize the data in a Postgres database:

SELECT products.id, 
   products.NAME, 
   product_types.type_name AS product_type, 
   delivery_types.delivery, 
   products.required_selections, 
   Count(s.id)                AS selections_count, 
   Sum(CASE 
         WHEN ss.status = 'WARNING' THEN 1 
         ELSE 0 
       END)                AS warning_count 
FROM   products 
   JOIN product_types 
     ON product_types.id = products.product_type_id 
   JOIN delivery_types 
     ON delivery_types.id = products.delivery_type_id 
   LEFT JOIN selections_products sp 
          ON products.id = sp.product_id 
   LEFT JOIN selections s 
          ON s.id = sp.selection_id 
   LEFT JOIN selection_statuses ss 
          ON ss.id = s.selection_status_id 
   LEFT JOIN listings l 
          ON ( s.listing_id = l.id 
               AND l.local_date_time BETWEEN 
                   To_timestamp('2014/12/01', 'YYYY/mm/DD' 
                   ) AND 
                   To_timestamp('2014/12/30', 'YYYY/mm/DD') ) 
GROUP  BY products.id, 
      product_types.type_name, 
      delivery_types.delivery 

Basically we have a product with selections, these selections have listings and the listings have a local_date. I need a list of all products and how many listings they have between the two dates. No matter what I do, I get a count of all selections (a total). I feel like I'm overlooking something. The same concept goes for warning_count. Also, I don't really understand why Postgres requires me to add a group by here.

The schema looks like this (the parts you would care about anyway):

products
  name:string
, product_type:fk
, required_selections:integer
, deliver_type:fk

selections_products
  product_id:fk
, selection_id:fk

selections
  selection_status_id:fk
, listing_id:fk

selection_status
  status:string

listing
 local_date:datetime
user3186332
  • 345
  • 3
  • 13
  • Your version of Postgres? Where does `f` in `count(f)` come from? When you write `how many listings`, do you want to count listings twice, if two selections of the same listing contain the same product? – Erwin Brandstetter Dec 25 '15 at 00:32

1 Answers1

2

The way you have it you LEFT JOIN to all selections irregardless of listings.local_date_time.

There is room for interpretation, we would need to see actual table definitions with all constraints and data types to be sure. Going out on a limb, my educated guess is you can fix your query with the use of parentheses in the FROM clause to prioritize joins:

SELECT p.id
     , p.name
     , pt.type_name AS product_type
     , dt.delivery
     , p.required_selections
     , count(s.id) AS selections_count
     , sum(CASE WHEN ss.status = 'WARNING' THEN 1 ELSE 0 END) AS warning_count
FROM   products       p
JOIN   product_types  pt ON pt.id = p.product_type_id
JOIN   delivery_types dt ON dt.id = p.delivery_type_id
LEFT   JOIN (  -- LEFT JOIN!
          selections_products sp
   JOIN   selections s  ON s.id  = sp.selection_id  -- INNER JOIN!
   JOIN   listings   l  ON l.id  = s.listing_id     -- INNER JOIN!
                       AND l.local_date_time >= '2014-12-01'
                       AND l.local_date_time <  '2014-12-31'
   LEFT   JOIN selection_statuses ss ON ss.id = s.selection_status_id
   ) ON sp.product_id = p.id
GROUP  BY p.id, pt.type_name, dt.delivery;

This way, you first eliminate all selections outside the given time frame with [INNER] JOIN before you LEFT JOIN to products, thus keeping all products in the result, including those that aren't in any applicable selection.

Related:

While selecting all or most products, this can be rewritten to be faster:

SELECT p.id
     , p.name
     , pt.type_name AS product_type
     , dt.delivery
     , p.required_selections
     , COALESCE(s.selections_count, 0) AS selections_count
     , COALESCE(s.warning_count, 0)    AS warning_count
FROM   products       p
JOIN   product_types  pt ON pt.id = p.product_type_id
JOIN   delivery_types dt ON dt.id = p.delivery_type_id
LEFT   JOIN (
   SELECT sp.product_id
        , count(*) AS selections_count
        , count(*) FILTER (WHERE ss.status = 'WARNING') AS warning_count
   FROM   selections_products sp
   JOIN   selections          s  ON s.id  = sp.selection_id
   JOIN   listings            l  ON l.id  = s.listing_id
   LEFT   JOIN selection_statuses ss ON ss.id = s.selection_status_id
   WHERE  l.local_date_time >= '2014-12-01'
   AND    l.local_date_time <  '2014-12-31'
   GROUP  BY 1
   ) s ON s.product_id = p.id;

It's cheaper to aggregate and count selections and warnings per product_id first, and then join to products. (Unless you only retrieve a small selection of products, then it's cheaper to reduce related rows first.)

Related:


Also, I don't really understand why Postgres requires me to add a group by here.

Since Postgres 9.1, the PK column in GROUP BY covers all columns of the same table. That does not cover columns of other tables, even if they are functionally dependent. You need to list those explicitly in GROUP BY if you don't want to aggregate them.

My second query avoids this problem on the outset by aggregating before the join.


Aside: chances are, this doesn't do what you want:

l.local_date_time BETWEEN To_timestamp('2014/12/01', 'YYYY/mm/DD')
                      AND To_timestamp('2014/12/30', 'YYYY/mm/DD')

Since date_time seems to be of type timestamp (not timestamptz!), you would include '2014-12-30 00:00', but exclude the rest of the day '2014-12-30'. And it's always better to use ISO 8601 format for dates and timestamps, which is means the same with every locale and datestyle setting. Hence:

WHERE  l.local_date_time >= '2014-12-01'
AND    l.local_date_time <  '2014-12-31'

This includes all of '2014-12-30', and nothing else. No idea why you chose to exclude '2014-12-31'. Maybe you really want to include all of Dec. 2014?

WHERE  l.local_date_time >= '2014-12-01'
AND    l.local_date_time <  '2015-01-01'
Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228