0
MySQL v5.7

Background

I have a MySQL table called managee_managers, which shows reporting relationships between members of a User class. The relevant parts of the model are:

integer "manager_id", null: false
integer "managee_id", null: false
integer "account_id"

So for example, on a given account, User A manages User B, who manages User C, etc.

My goal is to be able to detect cyclical connected components before a new instance of this class is saved, for example someone creating a managee_manager row where User C now manages User A. If we can detect a potential cycle, we can alert the user that creating a new managee_manager is invalid.

Our largest accounts have on the order of 100,000 rows in this table, and we support up to 10 levels in the manager-managee hierarchy tree. The app is a Rails app, and I've tried using Ruby's TSort library to do this directly in Ruby, but it takes too long for our purposes. I need a solution that can push this down to the MySQL level, and I'm thinking of a stored procedure to handle this.

So far, my SQL query looks like this:

delimiter //

CREATE PROCEDURE has_cycle()
       BEGIN
SELECT 
l1.account_id as account_id,
l1.managee_id as l1_id,
l2.managee_id as l2_id,
l3.managee_id as l3_id,
l4.managee_id as l4_id,
l5.managee_id as l5_id,
l6.managee_id as l6_id,
l7.managee_id as l7_id,
l8.managee_id as l8_id,
l9.managee_id as l9_id,
l10.managee_id as l10_id,
l11.managee_id as l11_id,
l12.managee_id as l12_id,
l13.managee_id as l13_id,
l14.managee_id as l14_id,
l15.managee_id as l15_id
from managee_managers l1
LEFT JOIN managee_managers l2
ON l1.manager_id = l2.managee_id
LEFT JOIN managee_managers l3
ON l2.manager_id = l3.managee_id
LEFT JOIN managee_managers l4
ON l3.manager_id = l4.managee_id
LEFT JOIN managee_managers l5
ON l4.manager_id = l5.managee_id
LEFT JOIN managee_managers l6
ON l5.manager_id = l6.managee_id
LEFT JOIN managee_managers l7
ON l6.manager_id = l7.managee_id
LEFT JOIN managee_managers l8
ON l7.manager_id = l8.managee_id
LEFT JOIN managee_managers l9
ON l8.manager_id = l9.managee_id
LEFT JOIN managee_managers l10
ON l9.manager_id = l10.managee_id
LEFT JOIN managee_managers l11
ON l10.manager_id = l11.managee_id
LEFT JOIN managee_managers l12
ON l11.manager_id = l12.managee_id
LEFT JOIN managee_managers l13
ON l12.manager_id = l13.managee_id
LEFT JOIN managee_managers l14
ON l13.manager_id = l14.managee_id
LEFT JOIN managee_managers l15
ON l14.manager_id = l15.managee_id
LIMIT 1000;

END//

delimiter ;

I built it with up to 15 levels, which admittedly is a bit of overkill given we'll only support up to 10 layers in the management hierarchy.

This produces results which look like this:

CALL has_cycle;

+------------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
| account_id | l1_id   | l2_id   | l3_id    | l4_id    | l5_id    | l6_id    | l7_id    | l8_id    | l9_id    | l10_id   | l11_id   | l12_id   | l13_id   | l14_id   | l15_id   |
+------------+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
|    3708867 | 6588137 |    NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |
|    3111155 | 4800685 | 4800555 |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |
|    3413985 | 6604007 | 5451955 |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |
|    3413985 | 6604057 | 5452245 |  5451955 |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |
|    1598855 | 2667475 | 5888957 |  5012155 | 10635375 |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |
|    4033317 | 6131407 | 6131187 |  6198267 |  6198247 | 15245335 |  9736545 |  6198267 |  6198247 | 15245335 |  9736545 |  6198267 |  6198247 | 15245335 |  9736545 |  6198267 |
|    3952447 | 6036007 |    NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |     NULL |

You can see on the 6th row that there's a cycle in the row data which extends all the way to column 15:

+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
| l1_id   | l2_id   | l3_id    | l4_id    | l5_id    | l6_id    | l7_id    | l8_id    | l9_id    | l10_id   | l11_id   | l12_id   | l13_id   | l14_id   | l15_id   |
+---------+---------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
| 6131407 | 6131187 |  6198267 |  6198247 | 15245335 |  9736545 |  6198267 |  6198247 | 15245335 |  9736545 |  6198267 |  6198247 | 15245335 |  9736545 |  6198267 |
6198267 l3_id => 6198247 l4_id => 15245335 l5_id => 9736545 l6_id => 6198267 l7_id

So 6198267 is both the l3_id and l7_id here.

Question

I'm not sure how to add the cycle detection logic to the existing query I have above. I'd love to be able to tell you that I tried X, Y, and Z strategies, but I don't even know where to start.

This is MySQL 5.7, not 8.0, so I know I can't use recursive CTEs.

Richie Thomas
  • 3,073
  • 4
  • 32
  • 55

0 Answers0