4

My question is very similar to Multiple NOT distinct only it deals with multiple columns instead of one. I have a table like so:

A B C
1 1 0
1 2 1
2 1 2
2 1 3
2 2 4
2 3 5
2 3 6
3 1 7
3 3 8
3 1 9

And the result should be:

A B C
2 1 2
2 1 3
2 3 5
2 3 6
3 1 7
3 1 9

Essentially, like the above question, removing all unique entries only where uniqueness is determined by two columns instead of one. I already tried various tweaks to the above answer but couldn't get any of them to work.

Community
  • 1
  • 1

2 Answers2

3

You are using SQL Server, so this is easier than in Access:

select A, B, C
from (select t.*, count(*) over (partition by A, B) as cnt
      from t
     ) t
where cnt > 1;

This use of count(*) is as a window function. It is counting the number of rows with the same value of A and B. The final where just selects the rows that have more than one entry.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Works like a charm! Thanks! Any clarification of what the OVER and the PARTITION BY statements are doing? – 1215drew -NWR- Jul 23 '13 at 15:36
  • @1215drew-NWR- . . . These used for window functions, which are very useful. Database documentation covers these pretty well (at least SQL Server and Oracle). They are part of the standard and readily available (SQL Server, Oracle, Postgres, DB2, for instance). I know I cover them in my book "Data Analysis Using SQL and Excel". Basic SQL books don't usually cover this topic. – Gordon Linoff Jul 23 '13 at 15:48
  • Thanks, I'll dig into the documentation a bit more, and maybe check out your book. – 1215drew -NWR- Jul 24 '13 at 19:06
1

Another possible solution with EXISTS

SELECT a, b, c 
  FROM Table1 t
 WHERE EXISTS 
(
  SELECT 1 
    FROM Table1
   WHERE a =  t.a 
     AND b =  t.b
     AND c <> t.c
)

It should be fast enough.

Output:

| A | B | C |
-------------
| 2 | 1 | 2 |
| 2 | 1 | 3 |
| 2 | 3 | 5 |
| 2 | 3 | 6 |
| 3 | 1 | 7 |
| 3 | 1 | 9 |

Here is SQLFiddle demo

peterm
  • 91,357
  • 15
  • 148
  • 157