0

I have a query which get data summarised between two dates like so:

SELECT date(created_at),
COUNT(COALESCE(id, 0))                                                                      AS total_orders,
   SUM(COALESCE(total_price, 0))                                                               AS total_price,
   SUM(COALESCE(taxes, 0))                                                                     AS taxes,
   SUM(COALESCE(shipping, 0))                                                                  AS shipping,
   AVG(COALESCE(total_price, 0))                                                               AS average_order_value,
   SUM(COALESCE(total_discount, 0))                                                            AS total_discount,
   SUM(total_price - COALESCE(taxes, 0) - COALESCE(shipping, 0) - COALESCE(total_discount, 0)) as net_sales
FROM orders
WHERE shop_id = 43
  AND orders.active = true
  AND orders.created_at >= '2022-07-20'
  AND orders.created_at <= '2022-07-26'
GROUP BY date (created_at)
order by created_at::date desc

However for dates that do not have any orders, the query returns nothing and I'd like to return 0.

I have tried with COALESCE but that doesn't seem to do the trick?

Any suggestions?

WagnerMatosUK
  • 4,309
  • 7
  • 56
  • 95
  • What date value do you want to return in the case of a 0 count? – Tim Biegeleisen Jul 25 '22 at 07:01
  • Use `generate_seies` and `left join` to the resultant list of dates. – Jonathan Willcock Jul 25 '22 at 07:10
  • You can use a simple nested query and count the resulting rows if it's 0 pick the default values in the outer query. – visrey Jul 25 '22 at 07:10
  • @TimBiegeleisen `0` would be ideal. I'm making a graph with the result, so I can't have missing dates. – WagnerMatosUK Jul 25 '22 at 07:47
  • This sounds like something you should handle in the presentation layer which is calling Postgres. Just check for an empty result set, in that case, render a different graph. – Tim Biegeleisen Jul 25 '22 at 07:48
  • @JonathanWillcock & visrey do you have an example? – WagnerMatosUK Jul 25 '22 at 07:48
  • I'm formatting in the presentation layer but I'd like at least an empty array (or zero or similar) to so I can replace with what I need for the graph. Currently, if I query for the last week and only 1 day has orders, I only get one array and to fill the array with remaining dates is becoming expensive (this is for a dashboard and there are many similar queries) – WagnerMatosUK Jul 25 '22 at 07:51
  • A table definition showing data types and constraints (`CREATE TABLE orders ...`) would be instrumental. And always your version of Postgres. – Erwin Brandstetter Jul 25 '22 at 22:09

3 Answers3

2

This should be substantially faster - and correct:

SELECT *
     , total_price - taxes - shipping - total_discount AS net_sales  -- ⑤
FROM  (
   SELECT created_at
        , COALESCE(total_orders        , 0) AS total_orders
        , COALESCE(total_price         , 0) AS total_price
        , COALESCE(taxes               , 0) AS taxes
        , COALESCE(shipping            , 0) AS shipping
        , COALESCE(average_order_value , 0) AS average_order_value
        , COALESCE(total_discount      , 0) AS total_discount
   FROM   generate_series(timestamp '2022-07-20'  -- ①
                        , timestamp '2022-07-26'
                        , interval '1 day') AS g(created_at)
   LEFT  JOIN (  -- ③
      SELECT created_at::date
           , count(*)            AS total_orders  -- ⑥
           , sum(total_price)    AS total_price
           , sum(taxes)          AS taxes
           , sum(shipping)       AS shipping
           , avg(total_price)    AS average_order_value
           , sum(total_discount) AS total_discount
      FROM   orders
      WHERE  shop_id = 43
      AND    active  -- simpler
      AND    created_at >= '2022-07-20'
      AND    created_at <  '2022-07-27'  -- ② !
      GROUP  BY 1
      ) o USING (created_at)  -- ④
   ) sub
ORDER  BY created_at DESC;

db<>fiddle here

I copied, simplified, and extended Xu's fiddle for comparison.

① Why this particular form for generate_series()? See:

② Assuming created_at is data type timestamp your original formulation is most probably incorrect. created_at <= '2022-07-26' would include the first instant of '2022-07-26' and exclude the rest. To include all of '2022-07-26', use created_at < '2022-07-27'. See:

③ The LEFT JOIN is the core feature of this answer. Generate all days with generate_series(), independently aggregate days from table orders, then LEFT JOIN to retain one row per day like you requested.

④ I made the column name match created_at, so we can conveniently shorten the join syntax with the USING clause.

⑤ Compute net_sales in an outer SELECT after replacing NULL values, so we need COALESCE() only once.

count(*) is equivalent to COUNT(COALESCE(id, 0)) in any case, but cheaper. See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

Please refer to the below script.

SELECT *
FROM 
    (SELECT date(created_at) AS created_at,
         COUNT(id) AS total_orders,
         SUM(total_price) AS total_price,
         SUM(taxes) AS taxes,
         SUM(shipping) AS shipping,
         AVG(total_price) AS average_order_value,
         SUM(total_discount) AS total_discount,
         SUM(total_price - taxes - shipping - total_discount) AS net_sales
    FROM orders
    WHERE shop_id = 43
        AND orders.active = true
        AND orders.created_at >= '2022-07-20'
        AND orders.created_at <= '2022-07-26'
    GROUP BY  date (created_at)
UNION
SELECT dates AS created_at,
         0 AS total_orders,
         0 AS total_price,
         0 AS taxes,
         0 AS shipping,
         0 AS average_order_value,
         0 AS total_discount,
         0 AS net_sales
FROM generate_series('2022-07-20', '2022-07-26', interval '1 day') AS dates
WHERE dates NOT IN 
    (SELECT created_at
    FROM orders
    WHERE shop_id = 43
        AND orders.active = true
        AND orders.created_at >= '2022-07-20'
        AND orders.created_at <= '2022-07-26' ) ) a
ORDER BY  created_at::date desc;

There is one sample for your reference. Sample

I got your duplicate test cases at my side. The root cause is created_at field (datattype:timestamp), hence there are duplicate lines.

Below script is correct for your request.

SELECT *
FROM 
    (SELECT date(created_at) AS created_at,
         COUNT(id) AS total_orders,
         SUM(total_price) AS total_price,
         SUM(taxes) AS taxes,
         SUM(shipping) AS shipping,
         AVG(total_price) AS average_order_value,
         SUM(total_discount) AS total_discount,
         SUM(total_price - taxes - shipping - total_discount) AS net_sales
    FROM orders
    WHERE shop_id = 43
        AND orders.active = true
        AND orders.created_at >= '2022-07-20'
        AND orders.created_at <= '2022-07-26'
    GROUP BY  date (created_at)
UNION
SELECT dates AS created_at,
         0 AS total_orders,
         0 AS total_price,
         0 AS taxes,
         0 AS shipping,
         0 AS average_order_value,
         0 AS total_discount,
         0 AS net_sales
FROM generate_series('2022-07-20', '2022-07-26', interval '1 day') AS dates
WHERE dates NOT IN 
    (SELECT date (created_at)
    FROM orders
    WHERE shop_id = 43
        AND orders.active = true
        AND orders.created_at >= '2022-07-20'
        AND orders.created_at <= '2022-07-26' ) ) a
ORDER BY  created_at::date desc;

Here is a sample that's same with your side. Link

Courser Xu
  • 142
  • 4
0

You can use WITH RECURSIVE to build a table of dates and then select dates that are not in your table

WITH RECURSIVE t(d) AS (
  (SELECT '2015-01-01'::date)
UNION ALL
  (SELECT d + 1 FROM t WHERE d + 1 <= '2015-01-10')
) SELECT d FROM t WHERE d NOT IN (SELECT d_date FROM tbl);

[look on this post : ][1]

[1]: https://stackoverflow.com/questions/28583379/find-missing-dates-postgresql#:~:text=You%20can%20use%20WITH%20RECURSIVE,SELECT%20d_date%20FROM%20tbl)%3B

  • 1
    `generate_series()` is much easier to use and probably more efficient as well –  Jul 25 '22 at 07:17