2

I recently asked a question here concerning an SQL query: Trouble wrapping head around complex SQL delete query

I now understand that what I'm trying to do is too complex to pull off with a single query or even multiple queries without some way to keep results in between. Therefore I decided to create a bash script (the end result will be something to do with a cronjob so bash is the most straightforward choice).

Consider the following table:

AssociatedClient:

+-----------+-----------------+
| Client_id | Registration_id |
+-----------+-----------------+
|         2 |               2 | 
|         3 |               2 | 
|         3 |               4 | 
|         4 |               5 | 
|         3 |               6 | 
|         5 |               6 | 
|         3 |               8 | 
|         8 |               9 | 
|         7 |              10 | 
+-----------------------------+

What I want to do is select all Registration_ids where the Client_id is in the list of Client_ids associated with a specific Registration_id.

Although I'm pretty noob with SQL, I found this query relatively easy:

SELECT `Registration_id` FROM `AssociatedClient` ac1
WHERE ac1.`Client_id` IN
                      (SELECT `Client_id` FROM `AssociatedClient` ac2 
                       WHERE ac2.`Registration_id` = $reg_id);

where $reg_id is just a bash variable.

This works but I would like to see it done with a self join, because it looks nicer, especially within a bash script where a lot of character clutter occurs. I'm afraid my SQL skills just don't reach that far.

Community
  • 1
  • 1
MarioDS
  • 12,895
  • 15
  • 65
  • 121

1 Answers1

2

If I've understood correctly, you should just be able to do a simple self join like so:

SELECT ac1.registration_id
FROM associatedclient ac1
JOIN associatedclient ac2 ON ac2.client_id = ac1.client_id
WHERE ac2.registration_id = $reg_id

So what you are doing is scanning the table once, joining it to itself where the client_id matches. Then you are restricting the joined rows to ones where the 2nd version of the table has a specific id, leaving you with the different permutations of the join on the 1st table, and then just picking the registration_id from those rows.

So, given the example of a variable value of 6, try running the following statement:

SELECT 
  ac1.client_id AS client_id_1
, ac1.registration_id AS reg_id_1
, ac2.client_id AS client_id_2
, ac2.registration_id AS reg_id_2
FROM associatedclient ac1
JOIN associatedclient ac2 ON ac1.client_id = ac2.client_id

and you'll notice the full set of joins. Then try adding the WHERE restriction and notice which rows come back. Then finally just pick the column you want.

You can check out a SQLFiddle I set up which tests it with a value of 6

Chris Cameron-Mills
  • 4,587
  • 1
  • 27
  • 28
  • That seems to work :) I must admit that at the moment I don't fully understand why, but I'll try figure it out... Edit: okay your explanation makes sense. Thanks. – MarioDS Feb 27 '13 at 10:38
  • I've tried to add an extra explanation. I'll update with a detailed step through if it will help you. – Chris Cameron-Mills Feb 27 '13 at 10:39
  • @tombom that was my bad. Originally I was selecting the `client_id`, should have been `registration_id`. I'll wait to accept this answer until I've seen more opinions. – MarioDS Feb 27 '13 at 10:45
  • @ChrisCameron-Mills thanks for your work. You also showed me that for optimalisation I needed to add `DISTINCT` and filter out the original ID. This query is a second level nested loop in my bash script so optimalisations like these do matter. – MarioDS Feb 27 '13 at 10:50
  • @tombom The fiddle was based on the explanation in the post which stated that the OP wanted to select a registration id. True, the original query might be fine but the question asked how it would be possible with a self join which is what my answer detailed. – Chris Cameron-Mills Feb 27 '13 at 10:53
  • 1
    @MarioDeSchaepmeester glad it helped. Yes, if you want to avoid duplicates a `DISTINCT` or `GROUP BY` will help. If you are interested in how a self join and nested sub-query might affect performance of larger datasets you can look into query plans (http://en.wikipedia.org/wiki/Query_plan) – Chris Cameron-Mills Feb 27 '13 at 10:56