5

I need to find matched pairs of records in SQL Server, but each record can only be included in 1 pair. Once a record has been matched with a pair, it should be removed from consideration for any future pairs.

I have tried solutions involving ROW_NUMBER() and LEAD(), but i just can't quite get there.

This will be used to pair financial accounts with similar accounts for review, based on multiple customer attributes such as credit score, income, etc.

Statement:

declare @test table (ID numeric, Color varchar(20))
insert into @test values
        (1,'Blue'),(2,'Red'),(3,'Blue'),(4,'Yellow'),(5,'Blue'),(6,'Red')

select* 
from @test t1
join @test t2 
    on t1.Color = t2.Color
    and t1.ID < t2.ID           -----removes reverse-pairs and self-pairs

Current results:

ID  Color   ID  Color
--- ------- --- --------
1   Blue    3   Blue
1   Blue    5   Blue        -----should not appear because 1 has already been paired
3   Blue    5   Blue        -----should not appear because 3 and 5 have already been paired
2   Red     6   Red

Needed results:

ID  Color   ID  Color
--- ------- --- --------
1   Blue    3   Blue
2   Red     6   Red
GMB
  • 216,147
  • 25
  • 84
  • 135
Carly Reum
  • 53
  • 3

2 Answers2

2

Editing with Max comments

Here is one way to get this done..

I first rank the records on the basis of color with the lowest id with rnk=1, next one with rnk=2.

After that i join the tables together by pulling the rnk=1 records and joining then with rnk=2.

declare @test table (ID numeric, Color varchar(20))
insert into @test values
        (1,'Blue'),(2,'Red'),(3,'Blue'),(4,'Yellow'),(5,'Blue'),(6,'Red'),(7,'Blue')

;with data
  as (select row_number() over(partition by color order by id asc) as rnk
            ,color
            ,id
       from @test
       )
select a.id,a.color,b.id,b.color
 from data a
 join data b
   on a.Color=b.Color
  and b.rnk=a.rnk+1
where a.rnk%2=1

i get the output as follows

+----+-------+----+-------+
| id | color | id | color |
+----+-------+----+-------+
|  1 | Blue  |  3 | Blue  |
|  5 | Blue  |  7 | Blue  |
|  2 | Red   |  6 | Red   |
+----+-------+----+-------+
George Joseph
  • 5,842
  • 10
  • 24
  • how could I enhance this awesome solution to capture additional pairs that share the same Color as this first 2 records? For instance, if there was a 7th ID that was also Blue, then that ID would be available to pair with ID 5. I can do this with AND ( (a.rnk = 1 and b.rnk = 2) OR (a.rnk = 3 and b.rnk = 4) ) but is there a better way to capture lots more pairs? – Carly Reum Mar 10 '20 at 16:35
  • 2
    @CarlyReum `WHERE a.RowNum % 2 = 1 AND b.RowNum = a.RowNum + 1` (and remove the `and b.rnk=2` clause from the join). [fiddle](http://sqlfiddle.com/#!18/15eb5/1) – Max Szczurek Mar 10 '20 at 16:39
  • Perfect!!! This combines what @GMB did, but without the aggregation, and what GeorgeJoseph did! Thank you all! – Carly Reum Mar 10 '20 at 16:43
1

You could use row_number() and conditional aggregation:

select
    max(case when rn % 2 = 0 then id end) id1,
    max(case when rn % 2 = 0 then color end) color1,
    max(case when rn % 2 = 1 then id end) id2,
    max(case when rn % 2 = 1 then color end) color2
from (
    select
        t.*,
        row_number() over(partition by color order by id) - 1 rn
    from @test t
) t
group by color, rn / 2
having count(*) = 2

The subquery ranks records having the same color by increasing id. Then, the outer query groups pairwise, and filters on groups that do contain two records.

Demo on DB Fiddle:

id1 | color1 | id2 | color2
:-- | :----- | :-- | :-----
1   | Blue   | 3   | Blue  
2   | Red    | 6   | Red   
GMB
  • 216,147
  • 25
  • 84
  • 135