SQL Server Count data in 2 columns with different values

Question

I want to make a smart count operation, so that if the data in columns are the same, then it will be counted as 1.

My table is:

dbo.Messages 
(
    FromUserId INT,
    ToUserId INT
)

Data:

INSERT dbo.Messages VALUES(1, 5), (2, 20), (5, 1), (1, 5);

The count should return 2 because (1,5) and (5,1) is the same in my algorithm.

How can I write it in SQL Server TSQL?

Thanks in advance.

Tim Biegeleisen · Answer 1 · 2016-08-15T11:22:05.637

One way to go about this is to group by the least and greatest of the FromUserId and ToUserId, using the distinct values from your original table. Since SQL Server, unlike MySQL, does not have a LEAST and GREATEST function, we can use CASE expressions instead.

SELECT CASE WHEN t.FromUserId < t.ToUserId THEN t.FromUserId ELSE t.ToUserId END,
       CASE WHEN t.FromUserId < t.ToUserId THEN t.ToUserId   ELSE t.FromUserId END,
       COUNT(*) AS duplicateCount
FROM
(
    SELECT DISTINCT FromUserId, ToUserId
    FROM dbo.Messages
) t
GROUP BY CASE WHEN t.FromUserId < t.ToUserId THEN t.FromUserId ELSE t.ToUserId END,
         CASE WHEN t.FromUserId < t.ToUserId THEN t.ToUserId   ELSE t.FromUserId END

Evaldas Buinauskas · Accepted Answer · 2016-08-15T11:32:11.500

This works quite well:

CREATE TABLE #Messages 
(
    FromUserId INT,
    ToUserId INT
);

INSERT #Messages VALUES(1, 5), (2, 20), (5, 1), (1, 5);

SELECT COUNT(*)
FROM (
  SELECT M1.FromUserId, M1.ToUserId
  FROM #Messages AS M1
  EXCEPT
  SELECT M2.ToUserId, M2.FromUserId
  FROM #Messages AS M2
  WHERE M2.ToUserId > M2.FromUserId
  ) AS T;

Derived table with EXCEPT will remove your duplicates and then it just counts so called unique values. Keep in mind that here's no need for DISTINCT keyword, EXCEPT removes all dupes.

Results from derived table:

FromUserId ToUserId 
---------- -------- 
1          5        
2          20

You can check how this query works here: https://data.stackexchange.com/stackoverflow/query/524634/counting-unique-values

score 1 · Answer 3 · edited May 23 '17 at 10:28

1

On SQL Server 2008 and later this should work:

SELECT distinct
    (SELECT Min(v) FROM (VALUES (FromUserId), (ToUserId)) AS value(v)) as UserIdMin,
    (SELECT Max(v) FROM (VALUES (FromUserId), (ToUserId)) AS value(v)) as UserIdMax
FROM dbo.Messages

Cred to: SQL MAX of multiple columns?

edited May 23 '17 at 10:28

Community

1
1

answered Aug 15 '16 at 11:25

Daniel Stackenland

3,149
1
19
22

1

This is great answer. To add something up, you don't need two seperate subqueries and could use `CROSS APPLY` to return both `MIN` and `MAX` values. Here's an example: https://data.stackexchange.com/stackoverflow/query/524750/counting-unique-values – Evaldas Buinauskas Aug 15 '16 at 14:29

score 0 · Answer 4 · edited Jan 18 '21 at 12:32

0

Demo here

select distinct  t1.*
from
#temp t1
join
#temp t2
on t1.FromUserId=t2.ToUserId
and t1.ToUserId=t2.FromUserId

edited Jan 18 '21 at 12:32

Community

1
1

answered Aug 15 '16 at 11:16

TheGameiswar

27,855
8
56
94

I don't think it meets A/C. If you would remove count and keep just distinct, it brings back 1 and 5 and that seems wrong. 1, 5 and 5, 1 should be treated as same. – Evaldas Buinauskas Aug 15 '16 at 13:51
@EvaldasBuinauskas:User is asking for count and in this case it returns2,did you saw the demo – TheGameiswar Aug 15 '16 at 13:53
I did. But it just happens that it brings 2. Edit data and your query doesn't work anymore. It will bring back incorrect count. – Evaldas Buinauskas Aug 15 '16 at 13:57
@EvaldasBuinauskas:thanks ,i was thinking of this line from users question whole time "count should return 2" ,updated now – TheGameiswar Aug 15 '16 at 14:07

SQL Server Count data in 2 columns with different values

4 Answers4