Skip result row if value is missing in group

Question

I've a Postgres table like this:

datetime	tenant_id	orders_today
2023-06-25 10:00	tenant2	2
2023-06-25 10:00	tenant1	1
2023-06-25 11:00	tenant1	5
2023-06-25 11:00	tenant2	2
2023-06-25 12:00	tenant1	5

Note that a orders_today for tenant2 hasn't yet been generated for time 12:00.

I use a query like this to summaries orders today:

SELECT datetime, SUM(orders_today)
FROM orders
GROUP BY datetime

But this gives me this result:

datetime	sum
2023-06-25 10:00	3
2023-06-25 11:00	7
2023-06-25 12:00	5

How can I make it ignore the group for time 12 where a count for tenant 2 is missing? And, if possible, can I make it use the previous value for tenant 2 from time 11?

A `CREATE TABLE` statement showing exact data types and constraints would also be instrumental. And your version of Postgres. Declare how many tenants there can be. (Always 2?) And how are time range and timestamps for the query determined? (Both tenants could be missing for an extended period of time ...) — Erwin Brandstetter, Jun 25 '23 at 21:50
Also, you want to substitute the previous value only? Or the latest preceding value, no matter how long ago? And default to what if no previous row is available? — Erwin Brandstetter, Jun 25 '23 at 21:59

score 1 · Answer 1 · answered Jun 25 '23 at 22:32

can I make it use the previous value for tenant 2 from time 11?

Assuming:

(datetime, tenant_id) is the PRIMARY KEY. So: UNIQUE and bth columns NOT NULL.
You want results for a given time range.
You want results for a given set of tenants ('tenant1' and 'tenant2' in the example).
One row for every full hour in the time range is expected.
Substitute for missing rows with the value for the same tenant from the previous hour.
If we can't get orders for each tenant this way, omit the row.

Generate all combinations of interest from tenant & timestamp, LEFT JOIN to your table, LEFT JOIN to a possible substitute, then proceed:

SELECT datetime, sum(orders) AS sum_orders_today
FROM  (
   SELECT d.datetime, COALESCE(o.orders_today, o1.orders_today) AS orders
   FROM   generate_series(timestamp '2023-06-25 10:00'
                        , timestamp '2023-06-25 13:00'
                        , interval '1 hour') AS d(datetime)    -- given time range
   CROSS  JOIN (VALUES ('tenant1'), ('tenant2')) t(tenant_id)  -- given tenants
   LEFT   JOIN orders o USING (tenant_id, datetime)            -- regular data
   LEFT   JOIN orders o1 ON o1.tenant_id = t.tenant_id         -- substitutes
                        AND o1.datetime  = d.datetime - interval '1 hour'
   ) sub
GROUP  BY 1
HAVING count(orders) = 2  -- omit hours without full data set
ORDER  BY 1;

fiddle

Similar:

If my assumptions don't hold, you have to be more specific.

score 0 · Answer 2 · answered Jun 25 '23 at 21:37

If I understand correctly, you want to exclude a group of datetimes if a tenant is missing, if so, you can use the having clause with the condition that the number of tenants for each datetime must equal the total number of tenants:

select datetime, sum(orders_today)
from mytable
group by datetime
having count(distinct tenant_id) = (select count(distinct tenant_id) from mytable)

Skip result row if value is missing in group

2 Answers2