group by range in mysql

Question

Table:   
new_table                                                    
user_number  | diff                  
     2       |  0                      
     1       |  28  
     2       |  32  
     1       |  40  
     1       |  53  
     1       |  59  
     1       |  101  
     1       |  105  
     2       |  108  
     2       |  129  
     2       |  130    
     1       |  144  


            |(result)
            v

range  | number of users  
0-20   |  2  
21-41  |  3  
42-62  |  1  
63-83  |  2  
84-104 |  1  
105-135|  0  
136-156|  3


select t.range as [range], count(*) as [number of users]  
from (  
  select case    
    when diff between 0 and 20 then ' 0-20'  
    when diff between 21 and 41 then ' 21-41'  
    when diff between 42 and 62 then ' 42-62'  
    when diff between 63 and 83 then ' 63-83'  
    when diff between 84 and 104 then ' 84-104'  
    when diff between 105 and 135 then ' 105-135'  
    else '136-156'   
     end as range  
  from new_table) t  
group by t.diff  

Error:

You have an error in your SQL syntax, near '[range], count(*) as [number of users]  
from (  
  select case  
    when' at line 1

Btw, your between-part needs to be changed according ranges. you have the same 0-20 range definition each line currently. — zerkms, Jul 14 '11 at 01:41
sorry for that you are right it should be accoring to the ranges — Soumya, Jul 14 '11 at 17:36
I was able to get this working but changing the "GROUP BY" query yo be: GROUP BY t.range — ecairol, Sep 10 '13 at 17:36
Thanks for this. I actually needed your query rather than the shorter solutions below so I could do non-regular intervals. — Michael K, Nov 13 '14 at 23:08

score 56 · Answer 1 · edited Mar 01 '19 at 22:42

56

Here is general code to group by range since doing a case statement gets pretty cumbersome.

The function 'floor' can be used to find the bottom of the range (not 'round' as Bohemian used), and add the amount (19 in the example below) to find the top of the range. Remember to not overlap the bottom and top of the ranges!

mysql> create table new_table (user_number int, diff int);
Query OK, 0 rows affected (0.14 sec)

mysql>  insert into new_table values (2, 0), (1, 28), (2, 32), (1, 40), (1, 53),
        (1, 59), (1, 101), (1, 105), (2, 108), (2, 129), (2, 130), (1, 144);
Query OK, 12 rows affected (0.01 sec)
Records: 12  Duplicates: 0  Warnings: 0

mysql> select concat(21*floor(diff/21), '-', 21*floor(diff/21) + 20) as `range`,
       count(*) as `number of users` from new_table group by 1 order by diff;
+---------+-----------------+
| range   | number of users |
+---------+-----------------+
| 0-20    |               1 |
| 21-41   |               3 |
| 42-62   |               2 |
| 84-104  |               1 |
| 105-125 |               2 |
| 126-146 |               3 |
+---------+-----------------+
6 rows in set (0.01 sec)

edited Mar 01 '19 at 22:42

jonasfh

4,151
2
21
37

answered Nov 01 '12 at 22:32

Josh

561
4
3

2

This is a great answer, and it illustrates an important approach to solving difficult query problems - don't forget that you can recode your data in a query to get what you want. In this case, you are mapping the number (here in the `diff` column) into the category you want, and then grouping on the category. Much better than having to build a separate table or explicitly write out a case statement. – D Mac Mar 20 '15 at 13:08
Does anyone know how to get this working when the values are decimal numbers? – Alessandro May 22 '16 at 13:52
1

Having hard time understanding the query. Great if anyone can explain. – Akshay Aug 28 '17 at 14:34
Cant get results... /* Error SQL (1055): Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'new_table.diff' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by */ – wtfowned Jul 03 '20 at 14:24
Works with SET SESSION sql_mode=(SELECT REPLACE(@@sql_mode,'ONLY_FULL_GROUP_BY','')); – wtfowned Jul 03 '20 at 14:25

score 9 · Answer 2 · answered Jul 14 '11 at 02:06

Here's a solution that will work for any magnitude of diff:

select
  concat(21 * round(diff / 21), '-', 21 * round(diff / 21) + 20) as `range`,
  count(*) as `number of users`
from new_table
group by 1
order by diff;

Here's some testable code and its output:

create table new_table (user_number int, diff int);
insert into new_table values (2, 0), (1, 28), (2, 32), (1, 40), (1, 53), (1, 59), (1, 101), (1, 105), (2, 108), (2, 129), (2, 130), (1, 144); 
-- run query, output is: 
+---------+-----------------+
| range   | number of users |
+---------+-----------------+
| 0-20    |               1 |
| 21-41   |               1 |
| 42-62   |               2 |
| 63-83   |               2 |
| 105-125 |               3 |
| 126-146 |               2 |
| 147-167 |               1 |
+---------+-----------------+

@khunshan 21 is the size of the data divisions. 20 is what you add to the start of a division to get the last value in a division. Alternatively I could have multiplied 21 by one more than the start value then subtracted 1; that may have been clearer — Bohemian, May 04 '16 at 06:05

nobody · Answer 3 · 2022-11-16T17:47:25.217

8

If you have regular ranges, a quicker solution would be to group with the help of div function.

For instance:

select diff div 20 as range_code, sum(user_number)
from new_table
group by diff div 20;

Ranges are represented as single digits in that case and you have to know what they mean: 0 = 0-19, 1 = 20-39, 2 = 40-59,...

If you need different ranges use different divider or maybe subtract some number from diff. For instance "(diff - 1) div 10" gives you ranges 1-10, 11-20, 21-30,...

edited Nov 16 '22 at 17:47

answered Jul 29 '14 at 09:39

nobody

64
5
15

This is simple and brilliant, thank you. Just change the word 'range' to 'range_code' for example as it is mysql reserved word. – Stefan Pavlov Nov 15 '22 at 15:58
`range` is changed to `range_code`. I am glad you like it, and find it useful. – nobody Nov 16 '22 at 17:52

score 7 · Accepted Answer · answered Jul 14 '11 at 01:38

7

Mysql as a delimiter for keywords uses backtick sign " ` ", not square brackets (like sql server)

answered Jul 14 '11 at 01:38

zerkms

249,484
69
436
539

score 2 · Answer 5 · answered Aug 06 '15 at 13:30

2

range is a mysql keyword. You should "scape" it using ´ :

select t.`range` as [`range`], ...

answered Aug 06 '15 at 13:30

jpereira

648
7
12

Itay Moav -Malimovka · Answer 6 · 2011-07-14T01:53:08.340

2

select 
case
when diff between 0 and 20 then ' 0-20'
when diff between 0 and 20 then ' 21-41'
when diff between 0 and 20 then ' 42-62'
when diff between 0 and 20 then ' 63-83'
when diff between 0 and 20 then ' 84-104'
when diff between 0 and 20 then ' 105-135'
else '136-156'
end; as 'range',
count(*) as 'number of users'


from new_table
group by range

edited Jul 14 '11 at 01:53

answered Jul 14 '11 at 01:38

Itay Moav -Malimovka

52,579
61
190
278

Nope, still wrong. Aliases should be delimited by backticks, not single quotes – zerkms Jul 14 '11 at 01:42
@zerkms - I tried looking for it, can you send a link to the right doc? I only see that key words and defined objects needs to be backticked – Itay Moav -Malimovka Jul 14 '11 at 01:44
@zerkms - so I was right, it is for identifiers, not aliases. b.t.w in comment below u said I still have error, that was a ; missing after the END. Fixed now. – Itay Moav -Malimovka Jul 14 '11 at 01:54
The very first sentence of that page says ALIASES ARE IDENTIFIERS. zerkms is right, stop arguing with him. – Dan Grossman Jul 14 '11 at 02:10
Yes, read the entire page, please. If you use single quotes everywhere you'll find that the query fails as you can't use them around aliases anywhere except the creation of the alias. `group by 'range'` will not group by the range column. Suck it up and stop arguing in favor of bad information and bad practice. It is unbecoming of someone in an engineering profession. – Dan Grossman Jul 14 '11 at 02:31
@Dan Grossman where in my solution I did `group by 'range'` When you define an aliad you put ', after you define an alias it becomes an identifier, **after** defining it, then you use `, but, have it your way, b.t.w my query is good, I really wonder why it was downvoted.... – Itay Moav -Malimovka Jul 14 '11 at 02:49
@Itay Moav: any reason to use in some cases single quotes and in some cases - backticks? The latter works in any cases, while the first - doesn't. Any reason to write error-prone queries? – zerkms Jul 14 '11 at 22:53

score 0 · Answer 7 · answered Jan 06 '21 at 11:53

This won't be the exact solution for this question, but it is just similar suggestion for someone other. I need to create the number buckets too, because if I groupped by number i got 9k different values.

I need to have smaller count of groups.

I managed it by grouping by logarithm (and round it). Now instead of 9k groups i have only 18 groups. (Then i will use it for PDF or CDF for 1-x scale score computation).

SELECT COUNT(*) AS `Rows`, round(log(`diff`)) f FROM `users` GROUP BY f ORDER BY f

score 0 · Answer 8 · answered Mar 08 '21 at 18:59

Here is a more generalized approach to binning in SQL:

SELECT
    concat(
        binsize * floor(diff / binsize),
        ' - ',
        binsize * floor(diff / binsize) + binsize - 1
    ) as range,
    count(*) as number_of_rows
FROM
    new_table,
    (
        SELECT
            21 as binsize
        FROM dual
    ) as prm
GROUP BY 1
ORDER BY floor(diff / binsize)

This way, you only have to provide the size of your range (called bins) once, in the sub query from dual.

The sub query returns a table of size 1 in both dimensions, a single row with a single column. This table is cross tabulated with each row of the other table, so its value is accessible in each row of the first table. This works without specifying a join condition.

As long as you only return a single row, you can add parameters to your sub query. For example, you can define upper and lower bounds to exclude certain features from your result this way:

SELECT
    concat(
        binsize * floor(diff / binsize),
        ' - ',
        binsize * floor(diff / binsize) + binsize - 1
    ) as range,
    count(*) as number_of_rows
FROM
    new_table,
    (
        SELECT
            21 as binsize,
            21 as above,
            83 as below
        FROM dual
    ) as prm
WHERE
    diff >= above
    AND diff <= below
GROUP BY 1
ORDER BY floor(diff / binsize)

If your RDBMS supports it, consider restructuring your query to a CTE (Common Table Expression), which helps in making the expression look neater and more tidy by putting the declaration of parameters right to the start of the whole statement:

WITH prms as (
    SELECT
        21 as binsize,
        21 as above,
        83 as below
    FROM dual
)
SELECT
    concat(
        binsize * floor(diff / binsize),
        ' - ',
        binsize * floor(diff / binsize) + binsize - 1
    ) as range,
    count(*) as number_of_rows
FROM
    new_table, prms
WHERE
    diff >= above
    AND diff <= below
GROUP BY 1
ORDER BY floor(diff / binsize)

score 0 · Answer 9 · answered Jul 14 '11 at 01:37

0

One obvious mistake : Mysql uses backticks(

), not [](as sqlserver) . Change t.range as [range], count(*) as [number of users] to

t.range as `range`, count(*) as `number of users`

answered Jul 14 '11 at 01:37

a1ex07

36,826
12
90
103

actually, backticks is for field names, you meant ' ? – Itay Moav -Malimovka Jul 14 '11 at 01:38
@Itay Moav: nope, a1ex07 is correct, and your answer still contains error – zerkms Jul 14 '11 at 01:39
1

@Itay Moav: who "you"? I'm not the op. And his error is cspecified in the question – zerkms Jul 14 '11 at 01:43
Backticks are for field name and for aliases. And I believe for table names if you happen to have space or other characters which are not allowed. – a1ex07 Jul 14 '11 at 01:44
@a1ex07 - I do not see it anywhere in the doc that I have to use backticks for aliases, I do see it for field names and keywords. I can, but I can also use ', which I prefer to differentiate between aliases and keywords/real field names. – Itay Moav -Malimovka Jul 14 '11 at 01:50

score 0 · Answer 10 · edited May 23 '17 at 11:47

0

You might want to check Are square brackets valid in an SQL query?

I suspect that '[' and ']' are used in Microsoft's SQL but not mysql.

edited May 23 '17 at 11:47

Community

1
1

answered Jul 14 '11 at 01:42

lesleyb

21
2

group by range in mysql

10 Answers10

Linked

Related