I have a table employee with columns:
- company_id
- employee_id
- opted_out_on (date)
- opt_out_window_starts_on (date)
Both are employee-related so the vary between employees in the same company.
I want to have the number of employees who opted out where the month of the opted_out_on
equals the month of the opt_out_window_starts_on
, grouped by company size.
I wrote the below query but the figures were wrong.
Source data looks like:
company_id | employee_id | opt_out_on | opt_out_window_starts_on
-----------+-------------+------------+-------------------------
23101 | 920190500 | 20/01/2017 | 09/01/2017
12264 | 920190508 | 02/03/2017 | 04/02/2017
12039 | 920190513 | 07/11/2017 | 31/10/2017
12039 | 920190516 | 13/02/2017 | 11/02/2017
23109 | 920190520 | 06/01/2017 | 06/01/2017
21875 | 920190529 | 22/05/2017 | 15/05/2017
21875 | 920190532 | 19/05/2017 | 15/05/2017
21875 | 920190533 | 22/05/2017 | 15/05/2017
21875 | 920190534 | 15/05/2017 | 15/05/2017
21875 | 920190535 | 22/05/2017 | 15/05/2017
21875 | 920190536 | 19/05/2017 | 15/05/2017
18112 | 920190545 | 11/01/2017 | 05/01/2017
9764 | 920190546 | 02/05/2017 | 02/05/2017
18365 | 920190665 | 30/03/2017 | 28/03/2017
18365 | 920190668 | 28/03/2017 | 28/03/2017
18365 | 920190678 | 28/03/2017 | 28/03/2017
20928 | 920190699 | 14/06/2017 | 13/06/2017
20928 | 920190705 | 20/06/2017 | 13/06/2017
16860 | 920190717 | 17/01/2017 | 16/01/2017
16860 | 920190718 | 16/01/2017 | 16/01/2017
This data is available on SQL Fiddle.
My code is:
with size as
(
select distinct company_id
, date_trunc('month', opt_out_window_starts_on) :: date months
, case
when count(id) over (partition by company_id ) = 1 then
count(id)
filter
(
where opt_state = 'opted_out'
and
(
date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
)
)
else 0
end as "one"
, case
when count(id) over (partition by company_id ) = 2 then
count(id)
filter
(
where opt_state = 'opted_out'
and
(
date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
)
)
else 0
end as "two"
, case
when
count(id) over (partition by company_id ) >= 3
and count(id) over (partition by company_id ) <= 4
then
count(id) filter
(
where opt_state = 'opted_out'
and
(
date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
)
)
else 0
end as "3-4"
, case
when
count(id) over (partition by company_id ) >= 5
and count(id) over (partition by company_id ) <= 9
then
count(id)
filter
(
where opt_state = 'opted_out'
and
(
date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
)
)
else 0
end as "5-9"
, case
when
count(id) over (partition by company_id ) >= 10
and count(id) over (partition by company_id ) <= 29
then
count(id)
filter
(
where opt_state = 'opted_out'
and
(
date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
)
)
else 0
end as "10-29"
, case
when
count(id) over (partition by company_id ) >= 30
and count(id) over (partition by company_id ) <= 49
then
count(id)
filter (
where opt_state = 'opted_out'
and
(
date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
)
)
else 0
end as "30-49"
, case
when
count(id) over (partition by company_id) >= 50
and count(id) over (partition by company_id) <= 249
then
count(id)
filter
(
where opt_state = 'opted_out'
and
(
date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
)
)
else 0
end as "50-249"
, case
when
count(id) over (partition by company_id ) >= 250
then
count(id)
filter
(
where opt_state = 'opted_out'
and
(
date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
)
)
else 0
end as "250+"
from employees
group by company_id
, id
)
select to_char(date_trunc('month', months),'YYYY MON')
, sum(one)
, sum(two)
, sum("3-4")
, sum("5-9")
, sum("10-29")
, sum("30-49")
, sum("50-249")
, sum("250+")
from size
where to_char(date_trunc('month', months),'YYYY MON') is not null
group by date_trunc('month', months)
order by date_trunc('month', months) asc
;
Sample of the wrong outcome:
to_char | sum | sum | sum | sum | sum | sum | sum | sum
---------+-----+-----+-----+-----+-----+-----+-----+----
2017 JAN | 35 | 37 | 85 | 119 | 131 | 39 | 19 | 3
2017 FEB | 49 | 53 | 112 | 165 | 170 | 41 | 23 | 2
2017 MAR | 29 | 33 | 62 | 85 | 104 | 24 | 23 | 1
2017 APR | 37 | 50 | 117 | 150 | 159 | 33 | 28 | 3
2017 MAY | 71 | 84 | 150 | 182 | 208 | 51 | 44 | 4
2017 JUN | 35 | 51 | 80 | 90 | 128 | 26 | 43 | 7
2017 JUL | 69 | 84 | 146 | 177 | 173 | 51 | 45 | 5
2017 AUG | 93 | 137 | 168 | 194 | 234 | 63 | 51 | 5
2017 SEP | 75 | 60 | 97 | 127 | 136 | 44 | 44 | 6
2017 OCT | 91 | 113 | 132 | 157 | 126 | 38 | 49 | 6
2017 NOV | 125 | 158 | 168 | 198 | 162 | 36 | 58 | 11
2017 DEC | 49 | 81 | 83 | 103 | 91 | 27 | 40 | 10
2018 JAN | 134 | 132 | 119 | 158 | 132 | 37 | 41 | 12
2018 FEB | 127 | 111 | 155 | 142 | 121 | 37 | 45 | 15
2018 MAR | 112 | 92 | 119 | 115 | 116 | 35 | 37 | 15
2018 APR | 37 | 51 | 64 | 57 | 56 | 14 | 26 | 10
2018 MAY | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
If you take Jan 2017 as an example the total number of employees how opted out is 1404 while my query shows 468 (35 + 37 + 85 + 119 + 131 + 39 + 19 + 3
).
I checked the nummber against:
select count(id)
, date_trunc('month', opt_out_window_starts_on)
from employees
where opt_state = 'opted_out'
and date_trunc('month', opt_out_window_starts_on) = date_trunc('month', opt_out_on)
group by date_trunc('month', opt_out_window_starts_on)