3

Suppose I have a MySQL table that defines a collection of things, each of which is associated with either 1 or 2 owners. For example:

CREATE TABLE thing (
    id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT
    , name CHAR(10)
    , first_owner INT UNSIGNED NOT NULL
    , second_owner INT UNSIGNED DEFAULT NULL
    );

+----+------------+-------------+--------------+
| id | name       | first_owner | second_owner |
+----+------------+-------------+--------------+
| 1  | skateboard | Joe         | NULL         |
| 2  | flashlight | Joe         | NULL         |
| 3  | drill      | Joe         | Erica        |
| 4  | computer   | Erica       | NULL         |
| 5  | textbook   | Diane       | NULL         |
| 6  | cell phone | Amy         | Diane        |
| 7  | piano      | Paul        | Amy          |
+----+------------+-------------+--------------+

Each distinct owner is a node of a graph, and two owners in the same row constitute an edge between their nodes. A graph drawn from the above example rows looks like this:

In this example, there are two components: Joe and Erica are one; Diane, Paul and Amy are the other.

I want to identify these components in my table, so I add another column:

ALTER TABLE thing ADD COLUMN `group` INT UNSIGNED;

How could I write an UPDATE statement that would populate this new column by uniquely identifying the connected component to which the row belongs? Here's an example of an acceptable result for the above example rows:

+----+------------+-------------+--------------+-------+
| id | name       | first_owner | second_owner | group |
+----+------------+-------------+--------------+-------+
| 1  | skateboard | Joe         | NULL         | 1     |
| 2  | flashlight | Joe         | NULL         | 1     |
| 3  | drill      | Joe         | Erica        | 1     |
| 4  | computer   | Erica       | NULL         | 1     |
| 5  | textbook   | Diane       | NULL         | 2     |
| 6  | cell phone | Amy         | Diane        | 2     |
| 7  | piano      | Paul        | Amy          | 2     |
+----+------------+-------------+--------------+-------+

I could do this with a stored procedure, but my actual scenario involves more tables and millions of rows, so I'm hoping there's a clever way to do this without looping through cursors for a week.

This is a simplified example for the purpose of illustrating the problem. Each component is supposed to represent a "household" and most will have only 1 or 2 nodes, but those with more nodes are especially important. There isn't necessarily any strict upper limit to the size of a household.

Air
  • 8,274
  • 2
  • 53
  • 88
  • It's not easy to get the complete path of an adjacency list with the current possibilities of MySQL (5.6.x or so). – VMai Jul 09 '14 at 20:46
  • 2
    any limit to the size of each group (IE, can you get a fifty person group?...or is three the upper limit?). – Twelfth Jul 09 '14 at 20:49
  • @Twelfth The depth could be the real problem. OP could self join the table (n-1 times, if the depth is n). Think of "more tables" and "millions of rows". – VMai Jul 09 '14 at 20:54
  • @Twelfth We might be able to safely assume a limit of 4 or 5 nodes per component. – Air Jul 09 '14 at 20:59
  • 1
    MySQL lacks recursive abilities required to run queries of this kind. There are certain witchcraft techniques which allow to overcome this, but they are not efficient and not reliable. You better parse this on the client or in a stored procedure. – Quassnoi Jul 09 '14 at 21:17
  • @airthomas - I've looked into a few methods on this and everything I can find isn't feasible in MySQL (Quassnoi above). The only thing I can think of is starting on the second owner column, left joining the table to itself where first owner = second owner and repeating this logic to it's upper limit of 4-5 nodes...but even thats a bit messy and it will likely miss first owners with multiple second owners assoiciated. Not sure if I can answer this for you in MySQL – Twelfth Jul 15 '14 at 18:21
  • @Twelfth I've pretty much resigned myself to doing this outside of the database. It's not ideal, but it fits in memory so should work out. I appreciate the effort. – Air Jul 15 '14 at 18:27
  • Related question: http://stackoverflow.com/questions/7631048/connect-by-prior-equivalent-for-mysql – Adrian Aug 15 '14 at 10:09

2 Answers2

-1

You can consider this method of creating hierarchical queries in mysql

CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id(value INT) RETURNS INT
NOT DETERMINISTIC
READS SQL DATA
BEGIN
        DECLARE _id INT;
        DECLARE _parent INT;
        DECLARE _next INT;
        DECLARE CONTINUE HANDLER FOR NOT FOUND SET @id = NULL;

        SET _parent = @id;
        SET _id = -1;

        IF @id IS NULL THEN
                RETURN NULL;
        END IF;

        LOOP
                SELECT  MIN(id)
                INTO    @id
                FROM    t_hierarchy
                WHERE   parent = _parent
                        AND id > _id;
                IF @id IS NOT NULL OR _parent = @start_with THEN
                        SET @level = @level + 1;
                        RETURN @id;
                END IF;
                SET @level := @level - 1;
                SELECT  id, parent
                INTO    _id, _parent
                FROM    t_hierarchy
                WHERE   id = _parent;
        END LOOP;       
END

Also, a very good article on this topic Adjacency list vs. nested sets: MySQL

Adrian
  • 6,013
  • 10
  • 47
  • 68
  • 1
    This might be useful for other problems but my data is not hierarchical. I think you might be misunderstanding the question. (Also, did you know that the author of that blog is @Quassnoi, who commented on the question above?) – Air Aug 15 '14 at 16:59
-1

A very good answer to a related question

"What is the most efficient/elegant way to parse a flat table into a tree?"

There are several ways to store tree-structured data in a relational database. What you show in your example uses two methods:

  • Adjacency List (the "parent" column) and
  • Path Enumeration (the dotted-numbers in your name column).

Another solution is called Nested Sets, and it can be stored in the same table too. Read "Trees and Hierarchies in SQL for Smarties" by Joe Celko for a lot more information on these designs.

I usually prefer a design called Closure Table (aka "Adjacency Relation") for storing tree-structured data. It requires another table, but then querying trees is pretty easy.

Please have a look at original question for reference.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Adrian
  • 6,013
  • 10
  • 47
  • 68
  • 3
    This doesn't really seem like an answer to me. Maybe it would be better as a comment or edited into your other answer. – Air Aug 15 '14 at 17:00