Select first distinct rows on two columns

Question

I have a table similar to below (PatientData)

| id_number | phone        | email            | dob       |
| --        | ----------   | ----             | ----      |
| 6         | 04-------61  | example@gmail.com| 196949200 |
| 5         | 04-------61  | example@gmail.com| 296949200 |
| 4         | 55-------51  | example@gmail.com| 396949200 |
| 3         | 04-------61  | test@gmail.com   | 496949200 |
| 2         | 66-------61  | test@gmail.com   | 596949200 |
| 1         | 77-------61  | another@gmail.com| 696949200 |

I want to get rows that have a distinct 'phone' and 'email' (not a distinct combination) along with the data associated with the first instance (here dob as an example).

My desired output:

| id_number | phone        | email            | dob       |
| --        | ----------   | ----             | ----      |
| 6         | 04-------61  | example@gmail.com| 196949200 |
| 2         | 66-------61  | test@gmail.com   | 596949200 |
| 1         | 77-------61  | another@gmail.com| 696949200 |

Is there a way I can achieve this? I've tried using distinct and group by but I can't see how they would be helpful in this case.

If distinct on phone and email, your result for phone = 04-----61 will definitely be 2 rows, one is for example@gmail.com, another is test@gmail.com — learning, Dec 02 '22 at 06:17
There seems to be a rule missing. For if I just selected the first row (ID = 1) I'd be done. I'd have a distinct list (and dismissed three other phones and two other emails). You are showing three rows and dismissed one phone number only. So do you want to add a rule: "dismiss as few phones and emails as possible"? That would require some iterative method (in SQL: recursive query) where you take all phones' first rows and all emails' first rows and try all combinations and subsets to see which gets the longest list. Not really a task for SQL in my opinion. — Thorsten Kettner, Dec 05 '22 at 00:29

score 0 · Answer 1 · answered Dec 02 '22 at 08:02

DISTINCT (or GROUP BY) is a row operator that apply to all columns of the SELECT clause in the SELECT statement. If you want to operate your own specific distinct, you have to choose which data out of the distinct scope you need to get in the result. You can use either Max or MIN, or a random value...

As an example :

SELECT MIN(id), MIN(phone), email, MIN(dob)
FROM  "I have a table similar to below"
GROUP  BY email

score 0 · Accepted Answer · answered Dec 05 '22 at 00:36

After some Investigation the solution I came up with is below:

SELECT t1.*
FROM `patientData` AS t1
LEFT OUTER JOIN `patientData` AS t2
  ON (t1.mobile = t2.mobile OR t1.email = t2.email) AND t1.id_number < t2.id_number
WHERE (t2.id_number IS NULL)

Explanation: Return only the rows which do not have any id_numbers greater than it for each unique email and mobile

Select first distinct rows on two columns

2 Answers2