0

I'm working with a query that looks like this:

SELECT
  CASE 
    WHEN num BETWEEN 1 AND 5 THEN '1-5'
    WHEN num BETWEEN 6 AND 10 THEN '6-10'
    WHEN num BETWEEN 11 AND 20 THEN '11-20'
    WHEN num > 20 THEN '20+'
END AS bucket,
COUNT(1)
FROM data
GROUP BY 1

...

I'd like to order the results by the different values of bucket so the results look something like this

bucket count(1)
1-5 10
6-10 11
11-20 17
20+ 8

Since bucket is a string, simply sorting by that column doesn't do the trick.

Are there any string operations in SQL that can help me? For reasons I won't get into the CASE statements are the easiest way for me to get results in the format I need, so I'd rather deal with a bunch of string operations to get it in the order I want than change the structure of the query

halez
  • 1
  • 1
  • Welcome to SO. You could go with 01-05, 06-10, 11-20, 20+ as your strings. – topsail Jun 13 '22 at 23:46
  • `GROUP BY 1` <-- [You should change this to use the explicit column name](https://stackoverflow.com/questions/7392730/what-does-sql-clause-group-by-1-mean) - also `COUNT(1)` [is not idiomatic](https://stackoverflow.com/questions/1221559/count-vs-count1-sql-server), is there a reason you're not using `COUNT(*)`? – Dai Jun 13 '22 at 23:47
  • Is `num` an integer or non-integer value? If it's non-integer, what bucket should `5.5` go into? – Dai Jun 13 '22 at 23:50
  • Yeah, If this is going to be a production query or a query you intend to keep using for a while, it is better to make the grouped on column explicit by name - it makes the intention of the query more clear and protects it from unitentional bugs if there are future changes in the database table structure. Indeed - it is a little hard to be sure what this query means with that unnamed field in there just as (1) - is it the buckets, or another field? I am not really sure! – topsail Jun 13 '22 at 23:53

2 Answers2

1

My approach uses integer lower-bounds for the buckets (as integers are nicer to deal with than strings), and only generates the string bucket names at the very end, after grouping.

WITH histogramData AS (
    SELECT
        CASE
            WHEN num <  1 THEN NULL
            WHEN num <  6 THEN 1
            WHEN num < 11 THEN 6
            WHEN num < 21 THEN 11 ELSE 21
        END AS Bucket,
        data.*
    FROM
        data
),
grouped AS (
    SELECT
        h.Bucket,
        COUNT(*) AS [Count]
    FROM
        histogramData AS h
    GROUP BY
        h.Bucket
)
SELECT
    CASE ISNULL( g.Bucket, 0 )
        WHEN  0 THEN NULL,
        WHEN  1 THEN '1-5'
        WHEN  6 THEN '6-10'
        WHEN 11 THEN '11-20' ELSE '21+'
    END AS [Bucket Name],
    g.[Count]
FROM
    grouped
ORDER BY
    g.Bucket 
Dai
  • 141,631
  • 28
  • 261
  • 374
  • Is this solution faster than your first proposed solution? (despite the two ctes) @Dai – lemon Jun 14 '22 at 00:29
  • 1
    @lemon The only way to find out is by running it and looking at the query execution plan. – Dai Jun 14 '22 at 00:34
0
SELECT
  CASE 
    WHEN num BETWEEN 1 AND 5 THEN '01-05'
    WHEN num BETWEEN 6 AND 10 THEN '06-10'
    WHEN num BETWEEN 11 AND 20 THEN '11-20'
    WHEN num > 20 THEN '20+'
END AS bucket,
COUNT(1)
FROM data
GROUP BY 1
topsail
  • 2,186
  • 3
  • 17
  • 17