SQL group by if values are close

Question

Class| Value
-------------
A    | 1
A    | 2
A    | 3
A    | 10
B    | 1

I am not sure whether it is practical to achieve this using SQL. If the difference of values are less than 5 (or x), then group the rows (of course with the same Class)

Expected result

Class| ValueMin | ValueMax
---------------------------
A    | 1     |   3
A    | 10    |   10
B    | 1     |   1

For fixed intervals, we can easily use "GROUP BY". But now the grouping is based on nearby row's value. So if the values are consecutive or very close, they will be "chained together".

Thank you very much

Assuming MSSQL

You should always mention which SQL variant & Version you are using. SQL Server? Teradata? PostgreSQL? MySQL? as it is easy to do in some and not so easily in others. — , Dec 05 '13 at 14:57
http://stackoverflow.com/questions/7258465/group-close-numbers — Mihai, Dec 05 '13 at 15:01
What do you want to happen if you have values like 1, 3, 5, 7, 9, such that 1, is close to 3, is close to 5, is close to 7, etc, such that the range of near values extends far beyond 5? How will you decide where to break them? — Joel Coehoorn, Dec 05 '13 at 15:01
@Joel Coehoorn Chain them into same group. So the table may have to be sorted before grouping..? — Yiping, Dec 05 '13 at 15:03
@Yiping So, in that example group, you'd want to see a result with ValueMin 1, ValueMax 9? It seems like for large data sets, you'd end up with everything in the same group. Hmm... this is also tricker, because now what you're looking for first is _gaps_. — Joel Coehoorn, Dec 05 '13 at 15:32

Gordon Linoff · Answer 1 · 2013-12-05T17:03:41.827

2

You are trying to group things by gaps between values. The easiest way to do this is to use the lag() function to find the gaps:

select class, min(value) as minvalue, max(value) as maxvalue
from (select class, value,
             sum(IsNewGroup) over (partition by class order by value) as GroupId
      from (select class, value,
                   (case when lag(value) over (partition by class order by value) > value - 5
                         then 0 else 1
                    end) as IsNewGroup
            from t
           ) t
     ) t
group by class, groupid;

Note that this assumes SQL Server 2012 for the use of lag() and cumulative sum.

edited Dec 05 '13 at 17:03

answered Dec 05 '13 at 15:10

Gordon Linoff

1,242,037
58
646
786

I just tried, and the results is class minvalue maxvalue A 1 10 B 10 10 – Yiping Dec 05 '13 at 15:21
@Yiping . . . The problem was the `NULL` return value for `lag()` at the beginning. Easily fixed by changing the sign of the comparison. – Gordon Linoff Dec 05 '13 at 17:04
If I understand correct: you assign each a new groupid if the (current value - previous value) overshot the range 5, then "GROUP BY" the Class and groupid . This is the most intuitive – Yiping Dec 06 '13 at 04:02
@Yiping . . . This identifies where each new group starts (based on the `lag()`. That is a flag that is 0 or 1. It then does a cumulative sum, so everything within one group has the same group value. – Gordon Linoff Dec 06 '13 at 14:57

score 2 · Answer 2 · 2013-12-05T21:43:22.187

2

Update: *This answer is incorrect*

Assuming the table you gave is called sd_test, the following query will give you the output you are expecting

In short, we need a way to find what was the value on the previous row. This is determined using a join on row ids. Then create a group to see if the difference is less than 5. and then it is just regular 'Group By'.

If your version of SQL Server supports windowing functions with partitioning the code would be much more readable.

SELECT 
A.CLASS
,MIN(A.VALUE) AS MIN_VALUE
,MAX(A.VALUE) AS MAX_VALUE
FROM
     (SELECT 
      ROW_NUMBER()OVER(PARTITION BY CLASS ORDER BY VALUE) AS ROW_ID
      ,CLASS
      ,VALUE
      FROM SD_TEST) AS A
LEFT JOIN 
     (SELECT 
       ROW_NUMBER()OVER(PARTITION BY CLASS ORDER BY VALUE) AS ROW_ID
      ,CLASS
      ,VALUE
     FROM SD_TEST) AS B
ON A.CLASS = B.CLASS AND A.ROW_ID=B.ROW_ID+1
GROUP BY A.CLASS,CASE WHEN ABS(COALESCE(B.VALUE,0)-A.VALUE)<5 THEN 1 ELSE 0 END
ORDER BY A.CLASS,cASE WHEN ABS(COALESCE(B.VALUE,0)-A.VALUE)<5 THEN 1 ELSE 0 END DESC

ps: I think the above is ANSI compliant. So should run in most SQL variants. Someone can correct me if it is not.

edited Dec 05 '13 at 21:43

answered Dec 05 '13 at 15:12

Just verified and it is working! Looks like you are shifting the ROW_ID by 1 so we can retrieve the "previous row" value. – Yiping Dec 05 '13 at 15:19
This doesn't give the correct answer [sqlfiddle](http://www.sqlfiddle.com/#!3/6152c/1/0) – conan Dec 05 '13 at 16:29
@conan Good catch! I will have to think about this. In the meanwhile I will amend my answer to indicate it is incorrect. – Dec 05 '13 at 21:42
@Yiping It appears my solution does not work beyond the original test case. You can un-accept my answer and check out conan's answer - it appears to be working as you would expect. – Dec 05 '13 at 21:49
@Sashikanth Dareddy I went through about 6 methods that nearly solved it. It's only when I expanded the test data out that the bugs appeared. – conan Dec 05 '13 at 21:53

Linger · Answer 3 · 2013-12-05T15:39:06.173

1

Here is one way of getting the information you are after:

SELECT Under5.Class,
  (
    SELECT MIN(m2.Value) 
    FROM MyTable AS m2 
    WHERE m2.Value < 5 
      AND m2.Class = Under5.Class
  ) AS ValueMin,
  (
    SELECT MAX(m3.Value) 
    FROM MyTable AS m3 
    WHERE m3.Value < 5 
      AND m3.Class = Under5.Class
  ) AS ValueMax
FROM 
(
  SELECT DISTINCT m1.Class
  FROM MyTable AS m1 
  WHERE m1.Value < 5
) AS Under5
UNION
SELECT Over4.Class,
  (
    SELECT MIN(m4.Value) 
    FROM MyTable AS m4 
    WHERE m4.Value >= 5 
      AND m4.Class = Over4.Class
  ) AS ValueMin,
  (
    SELECT Max(m5.Value) 
    FROM MyTable AS m5 
    WHERE m5.Value >= 5 
      AND m5.Class = Over4.Class
  ) AS ValueMax
FROM 
(
  SELECT DISTINCT m6.Class
  FROM MyTable AS m6 
  WHERE m6.Value >= 5
) AS Over4

edited Dec 05 '13 at 15:39

answered Dec 05 '13 at 15:11

Linger

14,942
23
52
79

This is correct. (Except we have to change all the m2.class in the bottom part to m3,m4,m5). Thank you! – Yiping Dec 05 '13 at 15:24
*@Yiping*, good catch I fixed my answer to correct the Aliases. – Linger Dec 05 '13 at 15:31
Sorry I found this is not correct.. This try to group by >=5 or < 5, but my request is to group by values that are close. [3,5,7], [ 13,13,14], [ 25,27,31,35] – Yiping Dec 06 '13 at 00:55

conan · Accepted Answer · 2013-12-06T09:29:37.757

These give the correct result, using the fact that you must have the same number of group starts as ends and that they will both be in ascending order.

if object_id('tempdb..#temp') is not null drop table #temp

create table #temp (class char(1),Value int);

insert into #temp values ('A',1);
insert into #temp values ('A',2);
insert into #temp values ('A',3);
insert into #temp values ('A',10);
insert into #temp values ('A',13);
insert into #temp values ('A',14);
insert into #temp values ('b',7);
insert into #temp values ('b',8);
insert into #temp values ('b',9);
insert into #temp values ('b',12);
insert into #temp values ('b',22);
insert into #temp values ('b',26);
insert into #temp values ('b',67);

Method 1 Using CTE and row offsets

with cte as
(select  distinct class,value,ROW_NUMBER() over ( partition by class order by value ) as R from #temp),
cte2 as
(
    select 
        c1.class
        ,c1.value
        ,c2.R as PreviousRec
        ,c3.r as NextRec
    from 
        cte c1
        left join cte c2 on (c1.class = c2.class and c1.R= c2.R+1 and c1.Value < c2.value + 5)
        left join cte c3 on (c1.class = c3.class and c1.R= c3.R-1 and c1.Value > c3.value - 5)
)

select
    Starts.Class
    ,Starts.Value as StartValue
    ,Ends.Value as EndValue
from
    (
     select 
        class
        ,value
        ,row_number() over ( partition by class order by value ) as GroupNumber
    from cte2
        where PreviousRec is null) as Starts join
    (
     select 
        class
        ,value
        ,row_number() over ( partition by class order by value ) as GroupNumber
    from cte2
        where NextRec is null) as Ends on starts.class=ends.class and starts.GroupNumber = ends.GroupNumber

** Method 2 Inline views using not exists **

select
        Starts.Class
        ,Starts.Value as StartValue
        ,Ends.Value as EndValue            
from
    (
        select class,Value ,row_number() over ( partition by class order by value ) as GroupNumber
        from
            (select distinct class,value from #temp) as T
        where not exists (select 1 from #temp where class=t.class and Value < t.Value and Value > t.Value -5 )
    ) Starts join
    (
        select class,Value ,row_number() over ( partition by class order by value ) as GroupNumber
        from
            (select distinct class,value from #temp) as T
        where not exists (select 1 from #temp where class=t.class and Value > t.Value and Value < t.Value +5 )
    ) ends on starts.class=ends.class and starts.GroupNumber = ends.GroupNumber

In both methods I use a select distinct to begin because if you have a dulpicate entry at a group start or end things go awry without it.

@Yiping Oddly I used CTE to make it more readable :). Anyway I have edited it to add an alternate method. — conan, Dec 06 '13 at 09:20

SQL group by if values are close

4 Answers4