SELECT with sorting and counting by a joined table

Question

I'm having hard times with a PostgreSQL SELECT that at first looked pretty simple. Involved tables:

CREATE TABLE events (id INT, customers_id INT);
CREATE TABLE jobs (
  events_id INT,
  "from"    TIMESTAMP,
  until     TIMESTAMP,
  users_id  INT);

each event can have multiple jobs
start and end of an event is defined by the lowest "from" and the highest until of the corresponding jobs
each job can be assigned to a user

I need a table like the following:

events_id | customers_id | min(from) | max(until) | total_jobs | open_jobs
        1 |            1 |  .. 08:00 |   .. 11:00 |          4 |        1

My select so far:

SELECT e.id, e.customers_id, min(j.from) as min_from, max(j.until) as max_until, 
  count(j.id) as total_jobs
FROM events e
LEFT JOIN jobs j ON j.events_id = e.id
GROUP BY e.id, e.customers_id

This gives me the result for the first 5 columns, but how can I include a second count for just the jobs with users_id = NULL? I would assume that I need a second LEFT JOIN on the jobs table but somehow I can't get it working.

How can I implement that correctly and efficiently?

What do you mean by a second count for just the jobs with users_id = NULL? What is an open job? jobs that don't have users? ie, You want to know how to calculate open_jobs which is the number of jobs with user_id as null? — kkica, Oct 17 '18 at 21:37
Are you querying for all events (like your query suggests) or for one or few given `events_id`? Please always disclose your Postgres version. `SELECT version()` helps. I took the liberty to fix the syntax of your setup. Reserved words like `from` require double-quoting (except when table-qualified like in `j.from`). And what about the "sorting" you mention in the title? — Erwin Brandstetter, Oct 17 '18 at 22:52
@KristjanKica Sorry for not clarifying "open job". I mean the jobs with users_id = NULL — PKeidel, Oct 18 '18 at 08:47
@ErwinBrandstetter I'm querying for all events. Postgres version is 10.5. Thank you for your edit. I forgot the sorting, but it is only a `ORDER BY min("from")` — PKeidel, Oct 18 '18 at 08:51

score 1 · Answer 1 · answered Oct 17 '18 at 21:39

1

Assuming open job definition is until is null

WITH open_jobs_cte as (select events_id, customer_id, count(*) open_jobs FROM jobs WHERE until is null group by 1,2)
SELECT e.id, e.customers_id, min(j.from) as min_from, max(j.until) as max_until, count(j.id) as total_jobs, open_jobs
FROM events e
LEFT JOIN jobs j ON j.events_id = e.id
LEFT JOIN open_jobs_cte oj ON oj.events_id = e.id
GROUP BY e.id, e.customers_id

Based on your requirements/design customer can be excluded from cte

answered Oct 17 '18 at 21:39

demircioglu

3,069
1
15
22

Sorry, `open job` is a job with `users_id = NULL`. But that can be changed easily. I'll give it a try later when I'm at home. Never heard of `WITH (cte)`, thanks for your answer – PKeidel Oct 18 '18 at 09:06
If I remove customer_id (since it's in `events`) and change it to `users_id is null`, it gives me the error `ERROR: column "oj.open_jobs" must appear in the GROUP BY clause or be used in an aggregate function` – PKeidel Oct 18 '18 at 16:48

kkica · Answer 2 · 2018-10-18T08:57:27.743

Assuming open_jobs is the number of jobs with user_id as null?

SELECT e.id, e.customers_id, min(j.from) as min_from, max(j.until) as max_until, 
  count(j.id) as total_jobs,sum(case user_id is null then 1 else 0) as open_jobs
FROM events e
LEFT JOIN jobs j ON j.events_id = e.id
GROUP BY e.id, e.customers_id

Notice the sum(case user_id is null then 1 else 0). If the user_id is null then you have an open job, so you add 1, otherwise you add 0.

EDIT: Since you also need to sort by min_from, you can

select *from
(
SELECT e.id, e.customers_id, min(j.from) as min_from, max(j.until) as max_until, 
  count(j.id) as total_jobs,sum(case user_id is null then 1 else 0) as open_jobs
FROM events e
LEFT JOIN jobs j ON j.events_id = e.id
GROUP BY e.id, e.customers_id
) as results 
order by min_from

Thank you for your answer. I'll give it a try later when I'm at home. — PKeidel, Oct 18 '18 at 09:08

Erwin Brandstetter · Accepted Answer · 2018-10-18T21:14:09.830

While getting all or most events, it's typically fastest (and simplest) to aggregate the n-table before the join:

SELECT e.id, e.customers_id
     , j.min_from
     , j.max_until
     , j.total_jobs
     , j.open_jobs
FROM   events e
LEFT   JOIN (
   SELECT events_id    AS id -- alias only to ease join syntax
        , min("from")  AS min_from
        , max(until)   AS max_until
        , count(*)     AS total_jobs
        , count(*) FILTER (WHERE users_id IS NULL) AS open_jobs
-- equivalent for old versions:
--      , count(users_id IS NULL OR NULL)          AS open_jobs
   FROM   jobs
   GROUP  BY 1
   )  j USING (id);

This way you don't need to GROUP BY the 1-table at all.

And since you were considering a second LEFT JOIN: If you don't aggregate first, you'd run into a "proxy cross join" situation with this.

Two SQL LEFT JOINS produce incorrect result

The aggregate FILTER clause requires Postgres 9.4 or later.

How can I simplify this game statistics query?

I'll give it a try later. A proxy cross join could be the situation I ran into. `FILTER` looks great, I've never heard of it before. — PKeidel, Oct 18 '18 at 10:28

SELECT with sorting and counting by a joined table

3 Answers3