Suppose I have a MySQL table that defines a collection of things, each of which is associated with either 1 or 2 owners. For example:
CREATE TABLE thing (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT
, name CHAR(10)
, first_owner INT UNSIGNED NOT NULL
, second_owner INT UNSIGNED DEFAULT NULL
);
+----+------------+-------------+--------------+
| id | name | first_owner | second_owner |
+----+------------+-------------+--------------+
| 1 | skateboard | Joe | NULL |
| 2 | flashlight | Joe | NULL |
| 3 | drill | Joe | Erica |
| 4 | computer | Erica | NULL |
| 5 | textbook | Diane | NULL |
| 6 | cell phone | Amy | Diane |
| 7 | piano | Paul | Amy |
+----+------------+-------------+--------------+
Each distinct owner is a node of a graph, and two owners in the same row constitute an edge between their nodes. A graph drawn from the above example rows looks like this:
In this example, there are two components: Joe and Erica are one; Diane, Paul and Amy are the other.
I want to identify these components in my table, so I add another column:
ALTER TABLE thing ADD COLUMN `group` INT UNSIGNED;
How could I write an UPDATE
statement that would populate this new column by uniquely identifying the connected component to which the row belongs? Here's an example of an acceptable result for the above example rows:
+----+------------+-------------+--------------+-------+
| id | name | first_owner | second_owner | group |
+----+------------+-------------+--------------+-------+
| 1 | skateboard | Joe | NULL | 1 |
| 2 | flashlight | Joe | NULL | 1 |
| 3 | drill | Joe | Erica | 1 |
| 4 | computer | Erica | NULL | 1 |
| 5 | textbook | Diane | NULL | 2 |
| 6 | cell phone | Amy | Diane | 2 |
| 7 | piano | Paul | Amy | 2 |
+----+------------+-------------+--------------+-------+
I could do this with a stored procedure, but my actual scenario involves more tables and millions of rows, so I'm hoping there's a clever way to do this without looping through cursors for a week.
This is a simplified example for the purpose of illustrating the problem. Each component is supposed to represent a "household" and most will have only 1 or 2 nodes, but those with more nodes are especially important. There isn't necessarily any strict upper limit to the size of a household.