SQL to get distinct record for a combination of two column (Irrespective of order)

Question

Consider the below Table structure and data

CREATE TABLE Distance(
      source VARCHAR(20),
      destination VARCHAR(20),
      distance INTEGER
      );

Select * from Distance;

source     destination  distance
=======    ===========  ======
Chennai    Mumbai       500
Mumbai     Chennai      500
Mumbai     Bangalore    500
Bangalore  Mumbai       500
Goa        Mumbai       100
Mumbai     Goa          100
Kolkata    Goa          1000

I need the output to have single record for 2 cities if repeating, i,e, any one record among the below 2 is fine.

Chennai    Mumbai       500
Mumbai     Chennai      500

Expected o/p:
source     destination  distance
=======    ===========  ======
Chennai    Mumbai       500
Mumbai     Bangalore    500
Goa        Mumbai       100
Kolkata    Goa          1000

Gordon Linoff · Answer 1 · 2015-11-23T15:49:54.813

Here is one method using least() and greatest():

select least(source, destination), greatest(source, destination), max(distance)
from distance
group by least(source, destination), greatest(source, destination);

This has the disadvantage that you could return a row not in the table. For instance, if you had a single row with "Mumbai/Chennai/500", then this query would return "Chennai/Mumbai/500" -- and this row is not in the original table.

So, an alternative method is:

select source, destination, distance
from distance
where source < destination
union all
select destination, source, distance
from distance d
where source > destination and
      not exists (select 1
                  from distance d2
                  where d2.source = d.destination and d2.destination = d.source
                 );

This version is also ANSI-compatible and should work in all databases.

It's Standard SQL, but doesn't preserve the order of columns :) — dnoeth, Nov 23 '15 at 17:32

score 3 · Answer 2 · answered Nov 23 '15 at 17:31

If you need to preserve the order of columns you might use

SELECT *
FROM Distance t1
WHERE NOT EXISTS
 (
   SELECT * FROM Distance t2
   WHERE t1.destination = t2.source
     AND t1.source = t2.destination
     AND t1.destination > t2.destination
 );

When multiple rows per source/combination exist you must either add DISTINCT or a GROUP BY.

score 1 · Answer 3 · answered Nov 23 '15 at 15:13

1

SELECT DISTINCT LEAST(source,destination) a
              , GREATEST(source,destination) b
              , distance 
           FROM distance;

answered Nov 23 '15 at 15:13

Strawberry

33,750
13
40
57

I like this solution but it is not SQL standard. It would run on MySQL, PostgreSQL and Oracle but not on SQL Server. See here: https://stackoverflow.com/questions/3794451/greatest-and-least-in-sql-standard – Nick Oct 19 '19 at 15:07
Well, things were different back then – Strawberry Oct 19 '19 at 16:23

score 0 · Answer 4 · edited Nov 22 '20 at 01:12

0

select   
(case when source>destination then source else destination end) as src,
(case when source<destination then source else destination end) as dstn,
distance from distance

edited Nov 22 '20 at 01:12

D. Schreier

1,700
1
22
34

answered Nov 21 '20 at 17:01

sandeep gupta

21
5

2

You are adding an answer to a question with multiple upvoted answers that were posted 5 years ago. You should add some text to explain why, after 5 years, your answer is only showing up now. Is this new syntax that didn't exist 5 years ago? Is your answer better than the others? If the answer to both of these questions is "No", then your answer will probably just confuse others that find this question and look to the answers for guidance, and so is then inappropriate. – CryptoFool Nov 22 '20 at 04:43

score 0 · Answer 5 · answered Nov 28 '20 at 08:13

The following logic seems much easier and easy to understand and does the same work.

    select a.source,a.destination,a.distance as distance  from
    distance a join distance b on a.destination = b.source and b.destination = a.source
    and a.source < b.source
    union all
    select a.source,a.destination,a.distance from distance a  left join distance b 
    on a.destination = b.source and b.destination = a.source
    where b.source is NULL ;

SQL to get distinct record for a combination of two column (Irrespective of order)

5 Answers5

Linked

Related