0

I have a table employee with columns:

  • company_id
  • employee_id
  • opted_out_on (date)
  • opt_out_window_starts_on (date)

Both are employee-related so the vary between employees in the same company. I want to have the number of employees who opted out where the month of the opted_out_on equals the month of the opt_out_window_starts_on, grouped by company size. I wrote the below query but the figures were wrong.

Source data looks like:

company_id | employee_id | opt_out_on | opt_out_window_starts_on
-----------+-------------+------------+-------------------------
     23101 |   920190500 | 20/01/2017 | 09/01/2017
     12264 |   920190508 | 02/03/2017 | 04/02/2017
     12039 |   920190513 | 07/11/2017 | 31/10/2017
     12039 |   920190516 | 13/02/2017 | 11/02/2017
     23109 |   920190520 | 06/01/2017 | 06/01/2017
     21875 |   920190529 | 22/05/2017 | 15/05/2017
     21875 |   920190532 | 19/05/2017 | 15/05/2017
     21875 |   920190533 | 22/05/2017 | 15/05/2017
     21875 |   920190534 | 15/05/2017 | 15/05/2017
     21875 |   920190535 | 22/05/2017 | 15/05/2017
     21875 |   920190536 | 19/05/2017 | 15/05/2017
     18112 |   920190545 | 11/01/2017 | 05/01/2017
      9764 |   920190546 | 02/05/2017 | 02/05/2017
     18365 |   920190665 | 30/03/2017 | 28/03/2017
     18365 |   920190668 | 28/03/2017 | 28/03/2017
     18365 |   920190678 | 28/03/2017 | 28/03/2017
     20928 |   920190699 | 14/06/2017 | 13/06/2017
     20928 |   920190705 | 20/06/2017 | 13/06/2017
     16860 |   920190717 | 17/01/2017 | 16/01/2017
     16860 |   920190718 | 16/01/2017 | 16/01/2017

This data is available on SQL Fiddle.

My code is:

with size as 
(
    select distinct company_id
    , date_trunc('month', opt_out_window_starts_on) :: date months
    , case 
        when count(id) over (partition by company_id ) = 1 then 
            count(id)
            filter 
            (
                where opt_state = 'opted_out' 
                and 
                (
                    date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
                )
            )
        else 0 
    end as "one"
    , case 
        when count(id) over (partition by company_id ) = 2 then 
            count(id)
            filter 
            (
                where opt_state = 'opted_out' 
                and 
                (
                    date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
                )
            )
        else 0 
    end as "two"
    , case 
        when 
            count(id) over (partition by company_id ) >= 3 
            and count(id) over (partition by company_id ) <= 4
        then 
            count(id) filter 
            (
                where opt_state = 'opted_out' 
                and 
                (
                    date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
                )
            )
        else 0 
    end as "3-4"
    , case 
        when 
            count(id) over (partition by company_id ) >= 5 
            and count(id) over (partition by company_id ) <= 9
        then 
            count(id)
            filter 
            (
                where opt_state = 'opted_out' 
                and 
                (
                    date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
                )
            )
        else 0 
    end as "5-9"
    , case 
        when 
            count(id) over (partition by company_id ) >= 10 
            and count(id) over (partition by company_id ) <= 29
        then 
            count(id)
            filter 
            (
                where opt_state = 'opted_out' 
                and 
                (
                    date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
                )
            )
        else 0 
    end as "10-29"
    , case 
        when 
            count(id) over (partition by company_id ) >= 30 
            and count(id) over (partition by company_id ) <= 49
        then 
            count(id)
            filter (
                where opt_state = 'opted_out' 
                and 
                (
                    date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
                )
            )
        else 0 
    end as "30-49"
    , case 
        when 
            count(id) over (partition by company_id) >= 50 
            and count(id) over (partition by company_id) <= 249
        then 
            count(id)
            filter 
            (
                where opt_state = 'opted_out' 
                and 
                (
                    date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
                )
            )
        else 0 
    end as "50-249"
    , case 
        when 
            count(id) over (partition by company_id ) >= 250
        then 
            count(id)
            filter 
            (
                where opt_state = 'opted_out' 
                and 
                (
                    date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
                )
            )
        else 0 
    end as "250+"
    from employees
    group by company_id
    , id
) 
select to_char(date_trunc('month', months),'YYYY MON')
, sum(one)
, sum(two)
, sum("3-4")
, sum("5-9")
, sum("10-29")
, sum("30-49")
, sum("50-249")
    , sum("250+")
from size
where to_char(date_trunc('month', months),'YYYY MON') is not null
group by date_trunc('month', months)
order by date_trunc('month', months) asc
;

Sample of the wrong outcome:

  to_char  | sum | sum | sum | sum | sum | sum | sum | sum
  ---------+-----+-----+-----+-----+-----+-----+-----+----
  2017 JAN |  35 |  37 |  85 | 119 | 131 |  39 |  19 |   3
  2017 FEB |  49 |  53 | 112 | 165 | 170 |  41 |  23 |   2
  2017 MAR |  29 |  33 |  62 |  85 | 104 |  24 |  23 |   1
  2017 APR |  37 |  50 | 117 | 150 | 159 |  33 |  28 |   3
  2017 MAY |  71 |  84 | 150 | 182 | 208 |  51 |  44 |   4
  2017 JUN |  35 |  51 |  80 |  90 | 128 |  26 |  43 |   7
  2017 JUL |  69 |  84 | 146 | 177 | 173 |  51 |  45 |   5
  2017 AUG |  93 | 137 | 168 | 194 | 234 |  63 |  51 |   5
  2017 SEP |  75 |  60 |  97 | 127 | 136 |  44 |  44 |   6
  2017 OCT |  91 | 113 | 132 | 157 | 126 |  38 |  49 |   6
  2017 NOV | 125 | 158 | 168 | 198 | 162 |  36 |  58 |  11
  2017 DEC |  49 |  81 |  83 | 103 |  91 |  27 |  40 |  10
  2018 JAN | 134 | 132 | 119 | 158 | 132 |  37 |  41 |  12
  2018 FEB | 127 | 111 | 155 | 142 | 121 |  37 |  45 |  15
  2018 MAR | 112 |  92 | 119 | 115 | 116 |  35 |  37 |  15
  2018 APR |  37 |  51 |  64 |  57 |  56 |  14 |  26 |  10
  2018 MAY |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0

If you take Jan 2017 as an example the total number of employees how opted out is 1404 while my query shows 468 (35 + 37 + 85 + 119 + 131 + 39 + 19 + 3).

I checked the nummber against:

select count(id)
, date_trunc('month', opt_out_window_starts_on)
from employees
where opt_state = 'opted_out'
and date_trunc('month', opt_out_window_starts_on) = date_trunc('month', opt_out_on)
group by date_trunc('month', opt_out_window_starts_on)
JohnLBevan
  • 22,735
  • 13
  • 96
  • 178
ryan
  • 101
  • 9
  • 1
    show us sample data and sample result. Then pin point where it is wrong and put what is the correct result. – jose_bacoy Apr 17 '18 at 12:42
  • hi, sorry how can I add a data sample? thank you – ryan Apr 17 '18 at 12:45
  • IMHO seems like put in SQL things should be ported to 'view' layer, f.o. report engine – Jacek Cz Apr 17 '18 at 12:48
  • You can pick sample data (or dummy data) in your table then paste it in your SO post/question. Then show which data/result is not correct and what do you expect on the result. Look at some other post/question to get an idea. – jose_bacoy Apr 17 '18 at 12:48
  • Alternatively, you can set up a SQL Fiddle and put the sample data there. – Gordon Linoff Apr 17 '18 at 13:02
  • I've just edited it, please let me know if that's better or make more sense – ryan Apr 17 '18 at 13:20

1 Answers1

1

If I've understood your question correctly you want:

  • Columns to represent given ranges of company sizes (i.e. their total number of employees)
  • Rows giving year and month
  • The value of each cell to represent the number of employees in those companies who have opted out of something in the same month in which they were offered the option to opt out.

This code tackles the first part of this question; getting a list of companies, their sizes, and the number of leavers:

select company_id
, date_trunc('month', opt_out_window_starts_on) optOutMonth
, count(employee_id) as companySize
, count(case 
    when opt_state = 'opted_out'
    and date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
    then employee_id
    else null
end)
 as leaversInOptOutWindow
from employee
group by company_id, date_trunc('month', opt_out_window_starts_on)

SQL Fiddle

You can then join this with another table in which your ranges are defined to group the companies into sizes, and perform a pivot over the results.

Sadly Postgres does not support pivot tables, but it seems the same can be achieved via the crosstab function: PostgreSQL Crosstab Query

I've not got this to work in SQL Fiddle; not sure whether that's down to an issue in my code, or lack of support for this function (as I'm not familiar enough with Postgres)... However I believe this should be in the area of what you're after:

SELECT * FROM crosstab(
'
  select e.optOutMonth
    , s.name
    , sum(e.leaversInOptOutWindow) LeaverCount
    from
    (
        select company_id
        , date_trunc(''month'', opt_out_window_starts_on) optOutMonth
        , count(employee_id) as companySize
        , count(case 
            when opt_state = ''opted_out''
            and date_trunc(''month'', opt_out_on) = date_trunc(''month'', opt_out_window_starts_on)
            then employee_id
            else null
        end) as leaversInOptOutWindow
        from employee 
        group by company_id
        , date_trunc(''month'', opt_out_window_starts_on)
    ) e
    left outer join companySizeRange s
    on s.minSize <= e.companySize
    and (s.maxSize is null or s.maxSize >= e.companySize)
    group by e.optOutMonth, s.name
    order by e.optOutMonth, s.name
'
  , 'select name from companySizeRange order by minSize'
 ) as x(
   optOutMonth date, "1" int, "2" int, "3-4" int, "5-9" int, "10-29" int, "30-49" int, "50-249" int, "250+" int
 );

SQL Fiddle


Update

Here's a version that doesn't need crosstab / pivot:

select e.optOutMonth
, sum(case when s.companySize = 1 then e.leaversInOptOutWindow else 0 end) "1"
, sum(case when s.companySize = 2 then e.leaversInOptOutWindow else 0 end) "2"
, sum(case when s.companySize between 3 and 4 then e.leaversInOptOutWindow else 0 end) "3-4"
, sum(case when s.companySize between 5 and 9 then e.leaversInOptOutWindow else 0 end) "5-9"
, sum(case when s.companySize between 10 and 29 then e.leaversInOptOutWindow else 0 end) "10-29"
, sum(case when s.companySize between 30 and 49 then e.leaversInOptOutWindow else 0 end) "30-49"
, sum(case when s.companySize between 50 and 249 then e.leaversInOptOutWindow else 0 end) "50-249"
, sum(case when s.companySize >= 250 then e.leaversInOptOutWindow else 0 end) "250+"
from
(
    select company_id
    , count(employee_id) as companySize
    from employee 
    group by company_id
) s
inner join
(
    select company_id
    , date_trunc('month', opt_out_window_starts_on) optOutMonth
    , count(employee_id) as leaversInOptOutWindow
    from employee 
    where opt_state = 'opted_out'
    and date_trunc('month', opt_out_on) = date_trunc('month', opt_out_window_starts_on)
    group by company_id
    , date_trunc('month', opt_out_window_starts_on)
) e
on e.company_id = s.company_id
group by e.optOutMonth
order by e.optOutMonth

SQL Fiddle

JohnLBevan
  • 22,735
  • 13
  • 96
  • 178
  • yes that's exactly what i wanted , waiting you update thank you so much – ryan Apr 17 '18 at 14:36
  • Updated but haven't yet got it working in SQL Fiddle... so maybe I'm doing something wrong; or maybe this functionality's not supported... sadly I'm not entirely sure... – JohnLBevan Apr 17 '18 at 14:51
  • thank you so much for trying, it errored [42P01] ERROR: relation "companysizerange" does not exist – ryan Apr 17 '18 at 15:02
  • No worries. Take a look at the left pane of SQL Fiddle; that's a new table I created to define your ranges (not strictly needed; but I felt that simplifies the code & maintenance a little). – JohnLBevan Apr 17 '18 at 15:08
  • this is perfect it worked and the outcome is 100% correct, I really appreciate your help @ JohnLBevan – ryan Apr 17 '18 at 15:37
  • hi @JohanlBevan i just realised that the query isn't 100 % correct, the total number is correct but when it comes to company size it not showing the correct size of the company for example in your query you are grouping per company_id and month then the company size became the number of employees in the month where employee has opted out – ryan Apr 18 '18 at 09:06
  • @ryan very good point; apologies; I missed that when tweaking; corrected (the last query; older ones were works in progress anyway)... – JohnLBevan Apr 18 '18 at 09:29