MySQL v5.7
Background
I have a MySQL table called managee_managers
, which shows reporting relationships between members of a User class. The relevant parts of the model are:
integer "manager_id", null: false
integer "managee_id", null: false
integer "account_id"
So for example, on a given account, User A manages User B, who manages User C, etc.
My goal is to be able to detect cyclical connected components before a new instance of this class is saved, for example someone creating a managee_manager row where User C now manages User A. If we can detect a potential cycle, we can alert the user that creating a new managee_manager is invalid.
Our largest accounts have on the order of 100,000 rows in this table, and we support up to 10 levels in the manager-managee hierarchy tree. The app is a Rails app, and I've tried using Ruby's TSort library to do this directly in Ruby, but it takes too long for our purposes. I need a solution that can push this down to the MySQL level, and I'm thinking of a stored procedure to handle this.
So far, my SQL query looks like this:
delimiter //
CREATE PROCEDURE has_cycle()
BEGIN
SELECT
l1.account_id as account_id,
l1.managee_id as l1_id,
l2.managee_id as l2_id,
l3.managee_id as l3_id,
l4.managee_id as l4_id,
l5.managee_id as l5_id,
l6.managee_id as l6_id,
l7.managee_id as l7_id,
l8.managee_id as l8_id,
l9.managee_id as l9_id,
l10.managee_id as l10_id,
l11.managee_id as l11_id,
l12.managee_id as l12_id,
l13.managee_id as l13_id,
l14.managee_id as l14_id,
l15.managee_id as l15_id
from managee_managers l1
LEFT JOIN managee_managers l2
ON l1.manager_id = l2.managee_id
LEFT JOIN managee_managers l3
ON l2.manager_id = l3.managee_id
LEFT JOIN managee_managers l4
ON l3.manager_id = l4.managee_id
LEFT JOIN managee_managers l5
ON l4.manager_id = l5.managee_id
LEFT JOIN managee_managers l6
ON l5.manager_id = l6.managee_id
LEFT JOIN managee_managers l7
ON l6.manager_id = l7.managee_id
LEFT JOIN managee_managers l8
ON l7.manager_id = l8.managee_id
LEFT JOIN managee_managers l9
ON l8.manager_id = l9.managee_id
LEFT JOIN managee_managers l10
ON l9.manager_id = l10.managee_id
LEFT JOIN managee_managers l11
ON l10.manager_id = l11.managee_id
LEFT JOIN managee_managers l12
ON l11.manager_id = l12.managee_id
LEFT JOIN managee_managers l13
ON l12.manager_id = l13.managee_id
LEFT JOIN managee_managers l14
ON l13.manager_id = l14.managee_id
LEFT JOIN managee_managers l15
ON l14.manager_id = l15.managee_id
LIMIT 1000;
END//
delimiter ;
I built it with up to 15 levels, which admittedly is a bit of overkill given we'll only support up to 10 layers in the management hierarchy.
This produces results which look like this:
CALL has_cycle;
+------------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
| account_id | l1_id | l2_id | l3_id | l4_id | l5_id | l6_id | l7_id | l8_id | l9_id | l10_id | l11_id | l12_id | l13_id | l14_id | l15_id |
+------------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
| 3708867 | 6588137 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 3111155 | 4800685 | 4800555 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 3413985 | 6604007 | 5451955 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 3413985 | 6604057 | 5452245 | 5451955 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 1598855 | 2667475 | 5888957 | 5012155 | 10635375 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 4033317 | 6131407 | 6131187 | 6198267 | 6198247 | 15245335 | 9736545 | 6198267 | 6198247 | 15245335 | 9736545 | 6198267 | 6198247 | 15245335 | 9736545 | 6198267 |
| 3952447 | 6036007 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
You can see on the 6th row that there's a cycle in the row data which extends all the way to column 15:
+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
| l1_id | l2_id | l3_id | l4_id | l5_id | l6_id | l7_id | l8_id | l9_id | l10_id | l11_id | l12_id | l13_id | l14_id | l15_id |
+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
| 6131407 | 6131187 | 6198267 | 6198247 | 15245335 | 9736545 | 6198267 | 6198247 | 15245335 | 9736545 | 6198267 | 6198247 | 15245335 | 9736545 | 6198267 |
6198267 l3_id => 6198247 l4_id => 15245335 l5_id => 9736545 l6_id => 6198267 l7_id
So 6198267
is both the l3_id
and l7_id
here.
Question
I'm not sure how to add the cycle detection logic to the existing query I have above. I'd love to be able to tell you that I tried X, Y, and Z strategies, but I don't even know where to start.
This is MySQL 5.7, not 8.0, so I know I can't use recursive CTEs.