group by on a range

Question

i have a table with employee names and their vendor experiences. i have to create a table with the following data

data given to me is like

empname vendor experience
a        1
b        2
c        10
d        11
e        20
f        12
g        21
h        22

i want to generate a SQL query to display data like this

vendor_experience(months)   count
0-6                          2
0-12                         5
0-18                         5
more                         8

please help me with the query.

Take a look at http://stackoverflow.com/questions/2483140/oracle-how-to-group-by-over-a-range - it's kind of similar. — Piotr Zierhoffer, Sep 25 '12 at 13:05
Do you really want to double count the people in multiple ranges? — Gordon Linoff, Sep 25 '12 at 13:16

Nikola Markovinović · Accepted Answer · 2012-09-26T10:15:24.470

You might employ case statement to get counts of exclusive ranges:

select case when [vendor experience] <= 6 then '0-6'
            when [vendor experience] <= 12 then '0-12'
            when [vendor experience] <= 18 then '0-18'
            else 'more'
        end [vendor_experience(months)],
       count (*) [count]
  from experiences
 group by
       case when [vendor experience] <= 6 then '0-6'
            when [vendor experience] <= 12 then '0-12'
            when [vendor experience] <= 18 then '0-18'
            else 'more'
        end

This produces the same result as yours (inclusive ranges):

; with ranges as 
  (
    select 6 as val, 0 as count_all
    union all
    select 12, 0
    union all
    select 18, 0
    union all
    select 0, 1
  )
select case when ranges.count_all = 1
            then 'more'
            else '0-' + convert (varchar(10), ranges.val) 
        end [vendor_experience(months)],
       sum (case when ranges.count_all = 1 
                   or experiences.[vendor experience] <= ranges.val 
                 then 1 end) [count]
  from experiences
 cross join ranges
 group by ranges.val, ranges.count_all

count_all is set to 1 to mark open-ending range.

Sql Fiddle is here.

UPDATE: an attempt at explanation.

The first part starting with with and ending with closing bracket is called CTE. Sometimes it is referred to as inline view because it can be used multiple times in the same query and under some circumstances is updateable. Here it is used to prepare data for ranges and is appropriately named ranges. This name one uses in main query. Val is maximum value of a range, count_all is 1 if range has no upper end (18+, more, or however you wish to call it). Data rows are combined by means of union all. You might copy/paste section between parenthesis only and run it just to see the results.

Main body joins experiences table with ranges using cross join. This creates combinations of all rows from experiences and ranges. For row d 11 there will be 4 rows,

empname vendor experience val count_all
d       11                  6 0
d       11                 12 0
d       11                 18 0
d       11                  0 1

First case statement in select list produces caption by checking count_all - if it is one, outputs more, else constructs caption using upper range value. Second case statement counts using sum(1). As aggregate functions ignore nulls, and case having no else evaluates to null if match was not found, it is sufficient to check if count_all is true (meaning that this row from experiences is counted in this range) or if vendor experience is less or equal to upper range value of current range. In example above 11 will not be counted for first range but will be counted for all the rest.

Results are then grouped by val and count_all. To better see how it works you might remove group by and sum() and look at numbers before aggregation. Order by empname, val will help to see how values of [count] change depending on different val per an employee.

Note: I did my best with my current level of english language. Please don't hesitate to ask for clarification if you need one (or two, or as many as you need).

@GordonLinoff I was too hasty, this does not double-count. Lower range is not needed because it is dealt with by `when` ordering. Wander if OP actually wants to double-count. — Nikola Markovinović, Sep 25 '12 at 13:20
@NikolaMarkovinović - can you please explain how this query works. i totally understand you must be annoyed with that but i am having hard time understanding the concept behind it.please please help me out. — SQLify, Sep 26 '12 at 06:42
Hi, I'm on mobile ATM so I need to keep it short. I'll gladly try to explain the query later. Which query you ended up using? — Nikola Markovinović, Sep 26 '12 at 07:23
@NikolaMarkovinović : i required the query that considered inclusive ranges i.e., the second one — SQLify, Sep 26 '12 at 07:32
@NikolaMarkovinović i got the concept.. really nice . thanks a lot.. the problem is that ia m getting 1 more than the actual value in my table. like for 0-6 i should get 90 but i am getting 91. no idea whatsoever why that is happening. — SQLify, Sep 26 '12 at 11:32
Perhaps you want to include 6 in next group (6 - 12)? In that case `<` instead of <= will do. If not, are you willing to duplicate your situation at [Sql Fiddle](http://sqlfiddle.com)? And then post the link here? — Nikola Markovinović, Sep 26 '12 at 11:59
Hey sorry it was a tiny data issue. thanks a lot for the patience and efficiency of your answers @NikolaMarkovinović — SQLify, Sep 26 '12 at 12:56

Kapil Khandelwal · Answer 2 · 2012-09-25T13:52:08.630

Try this:

   INSERT INTO ResultTable ([vendor_experience(months)], count)
    Select *FROM
    (
    (SELECT  '0-6', Count(*) From TableA WHERE [vendor experience] <= 6
    UNION ALL
    SELECT  '0-12', Count(*) From TableA  WHERE [vendor experience] <= 12
    UNION ALL
    SELECT  '0-18', Count(*) From TableA  WHERE [vendor experience] <= 18
     UNION ALL
    SELECT  'more', Count(*) From TableA) as Temp
    )

If duplicate counts not needed, then try this:

select t.[vendor_experience(months)], count(*) as count
from (
  select case  
    when [vendor experience] between 0 and 6 then ' 0-6'
    when [vendor experience] between 7 and 12 then '0-12'
    when [vendor experience] between 13 and 18 then '0-18'
    when [vendor experience] >= 19 then 'more'
    else 'other' end as [vendor_experience(months)]
  from TableA) t
group by t.[vendor_experience(months)]

If vendor experience is not indexed, this will lead to four very expensive scans. UNION will also cause sort operators that aren't necessary, and wouldn't be introduced with UNION ALL. — Aaron Bertrand, Sep 25 '12 at 13:15

score 0 · Answer 3 · answered Sep 25 '12 at 13:19

A bit more dynamic, implement a table for the groupings:

create table #t (name varchar(10),e int)

insert into #t values ('a',0)
insert into #t values ('b',4)
insert into #t values ('c',3)
insert into #t values ('d',13)
insert into #t values ('e',25)
insert into #t values ('f',4)
insert into #t values ('g',19)
insert into #t values ('h',15)
insert into #t values ('i',7)


create table #g (t int, n varchar(10))

insert into #g values (6, '0-6') 
insert into #g values (12, '0-12')
insert into #g values (18, '0-18')
insert into #g values (99999, 'more')

select #g.n
,COUNT(*) 
from #g
inner join #t on #t.e <= #g.t
group by #g.n

you might want to play around with the value 99999 for example.

score 0 · Answer 4 · answered Sep 25 '12 at 13:35

Here is a way to get the cumulative values:

select sum(mon0_6) as mon0_6, sum(mon0_12) as mon0_12, sum(mon0_18) as mon0_18,
       sum(more) as more
from (select e.*,
             (case when [vendor experience] <= 6 then 1 else 0 end) as mon0_6,
             (case when [vendor experience] <= 12 then 1 else 0 end) as mon0_12,
             (case when [vendor experience] <= 18 then 1 else 0 end) as mon0_18,
             1 as more
     ) e

This puts them in separate columns. You can then use unpivot to put them in separate rows.

However, you might consider doing the cumulative sum at the application layer. I often do this sort of thing in Excel.

Doing a cumulative sum in SQL Server 2008 requires a self-join, either explicitly or via a correlated subquery. SQL Server 2012 supports much simpler syntax for cumulative sums (the over clause takes an order by argument).

group by on a range

4 Answers4