2

I need to select a count of items that fall into some range like this

create table numbers (val int);

insert into numbers(val) values (2), (3), (11), (12), (13), (31);

select count(1) as qty , val / 10 as range
from numbers
group by val / 10;

Obviously, if there're no items in range, it won't be included in the output. I could think of some inelegant ways to include all ranges in the output, but is there an elegant and fast one (in PostgreSQL or MS SQL Server dialect)

podiluska
  • 50,950
  • 7
  • 98
  • 104
synapse
  • 5,588
  • 6
  • 35
  • 65
  • It's a little hard to tell what you want. Are you trying to group in 10s, ie "0-9", "10-19", etc? – Craig Ringer Oct 05 '12 at 10:21
  • We have had two closely related questions recently. [Here](http://stackoverflow.com/questions/12641764/aggregating-x-y-coordinate-point-clouds-in-postgresql) and [here](http://stackoverflow.com/questions/12730072/truncate-timestamp-to-arbitrary-intervals). – Erwin Brandstetter Oct 05 '12 at 21:46

4 Answers4

5

It appears that you want to create a histogram of your results.

PostgreSQL:

select x, count(val)
from generate_series(1,6) x 
left outer join numbers on (x = width_bucket(val, 0, 60, 6))
group by x;

I've used width_bucket instead of a simple division and modulus because it's more general and easier to get right for more complex ranges. Also, it's awesome.

Mark Bannister's recursive CTE for sequence generation can be integrated and JOINed on as x instead of the generate_series for added portability if you want, and the limits can be automatically determined:

with recursive ranges(rangev) as (
    select 0 rangev union all select rangev+1 as rangev from ranges where rangev < 4
), bounds(lower_bucket, upper_bucket) as (
    select (min(val))/10, (max(val)/10)+1 from numbers
)
select
    rangev as bucket,
    rangev*10 AS lower_bound,
    (rangev+1)*10-1 AS upper_bound,
    count(val) AS num_in_bucket
from ranges cross join bounds
left outer join numbers on (rangev = width_bucket(val, lower_bucket, upper_bucket*10, upper_bucket))
group by rangev
order by rangev asc;

If you prefer /10 over width_bucket (say, if width_bucket isn't available in MS SQL) that's easy to change back.

Output:

 bucket | lower_bound | upper_bound | num_in_bucket 
--------+-------------+-------------+---------------
      0 |           0 |           9 |             0
      1 |          10 |          19 |             2
      2 |          20 |          29 |             3
      3 |          30 |          39 |             0
      4 |          40 |          49 |             1
(5 rows)
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
4

Postgresql 9.2 also implements the range type:

  SELECT 
    range.r, 
    count(val.n) 
  FROM 
    (VALUES (int4range '[0, 10)'), (int4range '[10, 20)'), (int4range '[20, 30)'), (int4range '[30, 40)')) range (r) 
     LEFT JOIN (VALUES (2), (3), (9), (11), (17), (31), (33)) AS val (n) ON  val.n <@   range.r
  GROUP BY
    range.r
  ORDER BY r ASC

┌─────────┬───────┐
│    r    │ count │
├─────────┼───────┤
│ [0,10)  │     3 │
│ [10,20) │     2 │
│ [20,30) │     0 │
│ [30,40) │     2 │
└─────────┴───────┘
greg
  • 3,354
  • 1
  • 24
  • 35
3

Try:

with ranges as 
(select 0 rangev union all 
 select rangev+1 as rangev 
 from ranges 
 where rangev < 4)
select count(val) as qty , rangev
from ranges 
left join numbers on rangev = val / 10
group by rangev;

(SQLServer version - change with ranges to with recursive ranges for PostgresQL)

  • Blast, I was about to get to a recursive CTE example for generating the number series, but you beat me to it. Does MS SQL Server really not understand, or assume, `RECURSIVE` in CTEs? – Craig Ringer Oct 05 '12 at 10:31
  • @CraigRinger the above is SQL server, i think. If anything, it's postgres that doesn't assume recursive. – podiluska Oct 05 '12 at 10:46
  • 1
    @podiluska: It's as you say - PostgresQL needs an explicit `recursive`, SQLServer assumes it; see this SQLFiddle: http://sqlfiddle.com/#!3/58aaf/4 –  Oct 05 '12 at 10:55
  • 1
    @podiluska: the `recursive` keyword is required by the SQL standard if you are doing a recursive CTE. –  Oct 05 '12 at 11:22
1

For MS SQL Server

declare @min int, @max int
    select @min = MIN(val/10), @max = MAX(val/10) from numbers

    select ranges.number as range, COUNT(val) as qty from master.dbo.spt_values ranges
        left join numbers on ranges.number = numbers.val/10
    where type='p' and number between @min and @max
    group by number    

The same principle could be applied to Postgres although the generation of the numbers list would be different

podiluska
  • 50,950
  • 7
  • 98
  • 104
  • PostgreSQL does not have this notion of "inline code" as SQL Server, so the principle cannot really be applied to PostgreSQL –  Oct 05 '12 at 11:23