Situation
My goal is to have a yearly cronjob that deletes certain data from a database based on age. To my disposal I have the powers of Bash and MySQL. I started with writing a bash script but then it struck me that maybe, I could do everything with just a single SQL query.
I'm more a programmer by nature and I haven't had much experience with data structures so that's why I would like some help.
Tables / data structure
The relevant tables and columns for this query are as follows:
Registration:
+-----+-------------------+
| Id | Registration_date |
+-----+-------------------+
| 2 | 2011-10-03 |
| 3 | 2011-10-06 |
| 4 | 2011-10-07 |
| 5 | 2011-10-07 |
| 6 | 2011-10-10 |
| 7 | 2011-10-13 |
| 8 | 2011-10-14 |
| 9 | 2011-10-14 |
| 10 | 2011-10-17 |
+-------------------------+
AssociatedClient:
+-----------+-----------------+
| Client_id | Registration_id |
+-----------+-----------------+
| 2 | 2 |
| 3 | 2 |
| 3 | 4 |
| 4 | 5 |
| 3 | 6 |
| 5 | 6 |
| 3 | 8 |
| 8 | 9 |
| 7 | 10 |
+-----------------------------+
Client: only Id is relevant here.
As you can see, this is a simple many-to-many relationship. A client can have multiple registrations to his name, and a registration can have multiple clients.
The goal
I need to delete all registrations and client data for clients who have not had a new registration in 5 years. Sounds simple, right?
The tricky part
The data should be kept if any other client on any registration from a specific client has a new registration within 5 years.
So imagine client A having 4 registrations with just him in them, and 1 registration with himself and client B. All 5 registrations are older than 5 years. If client B did not have a new registration in 5 years, everything should be deleted: client A registrations and record. If B did have a new registration within 5 years, all client A data should be kept, including his own old registrations.
What I've tried
Building my query, I got about this far:
DELETE * FROM `Registration` AS Reg
WHERE TIMESTAMPDIFF(YEAR, Reg.`Registration_date`, NOW()) >= 5
AND
(COUNT(`Id`) FROM `Registration` AS Reg2
WHERE Reg2.`Id` IN (SELECT `Registration_id` FROM `AssociatedClient` AS Clients
WHERE Clients.`Client_id` IN (SELECT `Client_id` FROM `AssociatedClient` AS Clients2
WHERE Clients2.`Registration_id` IN -- stuck
#I need all the registrations from the clients associated with the first
# (outer) registration here, that are newer than 5 years.
) = 0 -- No newer registrations from any associated clients
Please understand that I have very limited experience with SQL. I realise that even what I got so far can be heavily optimised (with joins etc) and may not even be correct.
The reason I got stuck is that the solution I had in mind would work if I could use some kind of loop, and I only just realised that this is not something you easily do in an SQL query of this kind.
Any help
Is much appreciated.