1

I'm having hard times with a PostgreSQL SELECT that at first looked pretty simple. Involved tables:

CREATE TABLE events (id INT, customers_id INT);
CREATE TABLE jobs (
  events_id INT,
  "from"    TIMESTAMP,
  until     TIMESTAMP,
  users_id  INT);
  • each event can have multiple jobs
  • start and end of an event is defined by the lowest "from" and the highest until of the corresponding jobs
  • each job can be assigned to a user

I need a table like the following:

events_id | customers_id | min(from) | max(until) | total_jobs | open_jobs
        1 |            1 |  .. 08:00 |   .. 11:00 |          4 |        1

My select so far:

SELECT e.id, e.customers_id, min(j.from) as min_from, max(j.until) as max_until, 
  count(j.id) as total_jobs
FROM events e
LEFT JOIN jobs j ON j.events_id = e.id
GROUP BY e.id, e.customers_id

This gives me the result for the first 5 columns, but how can I include a second count for just the jobs with users_id = NULL? I would assume that I need a second LEFT JOIN on the jobs table but somehow I can't get it working.

How can I implement that correctly and efficiently?

PKeidel
  • 2,559
  • 1
  • 21
  • 29
  • What do you mean by a second count for just the jobs with users_id = NULL? What is an open job? jobs that don't have users? ie, You want to know how to calculate open_jobs which is the number of jobs with user_id as null? – kkica Oct 17 '18 at 21:37
  • Are you querying for all events (like your query suggests) or for one or few given `events_id`? Please always disclose your Postgres version. `SELECT version()` helps. I took the liberty to fix the syntax of your setup. Reserved words like `from` require double-quoting (except when table-qualified like in `j.from`). And what about the "sorting" you mention in the title? – Erwin Brandstetter Oct 17 '18 at 22:52
  • @KristjanKica Sorry for not clarifying "open job". I mean the jobs with users_id = NULL – PKeidel Oct 18 '18 at 08:47
  • @ErwinBrandstetter I'm querying for all events. Postgres version is 10.5. Thank you for your edit. I forgot the sorting, but it is only a `ORDER BY min("from")` – PKeidel Oct 18 '18 at 08:51

3 Answers3

1

Assuming open job definition is until is null

WITH open_jobs_cte as (select events_id, customer_id, count(*) open_jobs FROM jobs WHERE until is null group by 1,2)
SELECT e.id, e.customers_id, min(j.from) as min_from, max(j.until) as max_until, count(j.id) as total_jobs, open_jobs
FROM events e
LEFT JOIN jobs j ON j.events_id = e.id
LEFT JOIN open_jobs_cte oj ON oj.events_id = e.id
GROUP BY e.id, e.customers_id

Based on your requirements/design customer can be excluded from cte

demircioglu
  • 3,069
  • 1
  • 15
  • 22
  • Sorry, `open job` is a job with `users_id = NULL`. But that can be changed easily. I'll give it a try later when I'm at home. Never heard of `WITH (cte)`, thanks for your answer – PKeidel Oct 18 '18 at 09:06
  • If I remove customer_id (since it's in `events`) and change it to `users_id is null`, it gives me the error `ERROR: column "oj.open_jobs" must appear in the GROUP BY clause or be used in an aggregate function` – PKeidel Oct 18 '18 at 16:48
1

Assuming open_jobs is the number of jobs with user_id as null?

SELECT e.id, e.customers_id, min(j.from) as min_from, max(j.until) as max_until, 
  count(j.id) as total_jobs,sum(case user_id is null then 1 else 0) as open_jobs
FROM events e
LEFT JOIN jobs j ON j.events_id = e.id
GROUP BY e.id, e.customers_id

Notice the sum(case user_id is null then 1 else 0). If the user_id is null then you have an open job, so you add 1, otherwise you add 0.

EDIT: Since you also need to sort by min_from, you can

select *from
(
SELECT e.id, e.customers_id, min(j.from) as min_from, max(j.until) as max_until, 
  count(j.id) as total_jobs,sum(case user_id is null then 1 else 0) as open_jobs
FROM events e
LEFT JOIN jobs j ON j.events_id = e.id
GROUP BY e.id, e.customers_id
) as results 
order by min_from
kkica
  • 4,034
  • 1
  • 20
  • 40
1

While getting all or most events, it's typically fastest (and simplest) to aggregate the n-table before the join:

SELECT e.id, e.customers_id
     , j.min_from
     , j.max_until
     , j.total_jobs
     , j.open_jobs
FROM   events e
LEFT   JOIN (
   SELECT events_id    AS id -- alias only to ease join syntax
        , min("from")  AS min_from
        , max(until)   AS max_until
        , count(*)     AS total_jobs
        , count(*) FILTER (WHERE users_id IS NULL) AS open_jobs
-- equivalent for old versions:
--      , count(users_id IS NULL OR NULL)          AS open_jobs
   FROM   jobs
   GROUP  BY 1
   )  j USING (id);

This way you don't need to GROUP BY the 1-table at all.

And since you were considering a second LEFT JOIN: If you don't aggregate first, you'd run into a "proxy cross join" situation with this.

The aggregate FILTER clause requires Postgres 9.4 or later.

Related:

Aside: never use reserved words like from as column names, unless you fancy complications ...

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I'll give it a try later. A proxy cross join could be the situation I ran into. `FILTER` looks great, I've never heard of it before. – PKeidel Oct 18 '18 at 10:28