MySQL: Data structure for transitive relations

Question

I tried to design a data structure for easy and fast querying (delete, insert an update speed does not really matter for me).

The problem: transitive relations, one entry could have relations through other entries whose relations I don't want to save separately for every possibility.

Means--> I know that Entry-A is related to Entry-B and also know that Entry-B is related to Entry-C, even though I don't know explicitly that Entry-A is related to Entry-C, I want to query it.

What I think the solution is:

Eliminating the transitive part when inserting, deleting or updating.

Entry:
    id
    representative_id

I would store them as sets, like group of entries (not mysql set type, the Math set, sorry if my English is wrong). Every set would have a representative entry, all of the set elements would be related to the representative element.

A new insert would insert the Entry and set the representative as itself.

enter image description here

If the newly inserted entry should be connected to another, I simply set the representative id of the newly inserted entry to the referred entry's rep.id.

Attach B to A

enter image description here

It doesn't matter, If I need to connect it to something that is not a representative entry, It would be the same, because every entry in the set would have the same rep.id.

Attach C to B

enter image description here

Detach B-C: The detached item would have become a representative entry, meaning it would relate to itself.

enter image description here

Detach B-C and attach C to X

enter image description here

Deletion: If I delete a non-representative entry, it is self explanatory. But deleting a rep.entry is harder a bit. I need to chose a new rep.entry for the set and set every set member's rep.id to the new rep.entry's rep.id.

So, delete A in this:

enter image description here

Would result this:

enter image description here

What do you think about this? Is it a correct approach? Am I missing something? What should I improve?

Edit: Querying: So, If I want to query every entry that is related to an certain entry, whose id i know:

~~SELECT * FROM entries a LEFT JOIN entries b ON (a.rep_id = b.rep_id) WHERE a.id = :id~~

SELECT * FROM AlkReferencia
WHERE rep_id=(SELECT rep_id FROM AlkReferencia
    WHERE id=:id);

About the application that requires this:

Basically, I am storing vehicle part numbers (references), one manufacturer can make multiple parts that can replace another and another manufacturer can make parts that are replacing other manufacturer's parts.

Reference: One manufacturer's OEM number to a certain product.

Cross-reference: A manufacturer can make products that objective is to replace another product from another manufacturer.

I must connect these references in a way, when a customer search for a number (doesn't matter what kind of number he has) I can list an exact result and the alternative products.

To use the example above (last picture): B, D and E are different products we may have in store. Each one has a manufacturer and a string name/reference (i called it number before, but it can be almost any character chain). If I search for B's reference number, I should return B as an exact result and D,E as alternatives.

So far so good. BUT I need to upload these reference numbers. I can't just migrate them from an ALL-IN-ONE database. Most of the time, when I upload references I got from a manufacturer (somehow, most of the time from manually, but I can use catalogs too), I only get a list where the manufacturer tells which other reference numbers point to his numbers.

Example.:

Asas filter manufacturer, "AS 1" filter has these cross references (means, replaces these):

GOLDEN SUPER  -->  1
ALFA ROMEO  -->  101000603000
ALFA ROMEO  -->  105000603007
ALFA ROMEO  -->  1050006040
RENAULT TRUCKS (RVI)  -->  122577600
RENAULT TRUCKS (RVI)  -->  1225961
ALFA ROMEO  -->  131559401
FRAD  -->  19.36.03/10
LANDINI  -->  1896000
MASSEY FERGUSON  -->  1851815M1
...

It would took ages to write all of the AS 1 references down, but there is many (~1500 ?). And it is ONE filter. There is more than 4000 filter and I need to store there references (and these are only the filters). I think you can see, I can't connect everything, but I must know that Alfa Romeo 101000603000 and 105000603007 are the same, even when I only know (AS 1 --> alfa romeo 101000603000) and (as 1 --> alfa romeo 105000603007). That is why I want to organize them as sets. Each set member would only connect to one other set member, with a rep_id, that would be the representative member. And when someone would want to (like, admin, when uploading these references) attach a new reference to a set member, I simply INSERT INTO References (rep_id,attached_to_originally_id,refnumber) VALUES([rep_id of the entry what I am trying to attach to],[id of the entry what I am trying to attach to], "16548752324551..");

Another thing: I don't need to worry about insert, delete, update speed that much, because it is an admin task in our system and will be done rarely.

MySQ>L doesn't support recursive queries currently. So you've got to build some helper structures to get a result for any depth of relation. — VMai, Aug 13 '14 at 07:01
That is what I am trying to avoid, by storing representative id to every entry that is connected — szab.kel, Aug 13 '14 at 07:01
Well you can't query such structures in MySQL efficiently. You've got to add another self join for every step in depth. — VMai, Aug 13 '14 at 07:03
Querying a set would be easy this way: SELECT * FROM entries WHERE XYentry.rep_id = :repid This returns everything that the set has in it. — szab.kel, Aug 13 '14 at 07:08
That are the first grade related objects. To get second grade related ones, then you've got to join your table (hence self join) and apply the same condition to the second `table`. For third grade you join your table another time and do the same, ... — VMai, Aug 13 '14 at 07:12
I edited my question. I am sure, that I won't need recursive querying, because every entry know its representative entry's id. — szab.kel, Aug 13 '14 at 07:36
I don't get you: You said **even though I don't know explicitly that Entry-A is related to Entry-C, I want to query it. (And any depth!)** This relationship requires an arbitrary complex query. Imagine it's a relation via 10 entities. Then you've got to self join your table 9 times. Have a try at second grade relations please. — VMai, Aug 13 '14 at 07:40
I made a mistake, I don't store depth in the data structure, I only store which set they are in. It doesn't matter which entry I want to attach another entry, all it matters what set they are in (group of entries). You can see that I disregard this information when connecting C to B — szab.kel, Aug 13 '14 at 07:43
You seem to be trying to manage a tree. There are a few [very elegant solutions](http://stackoverflow.com/q/192220/1446005) out there. You might be particularly interested in the [Nested Sets](http://stackoverflow.com/a/194031/1446005) approach. — RandomSeed, Aug 13 '14 at 08:12
Not a tree, a tree would means, when I am inserting C and attaching it to B, I would connect it to B, but I am connecting it to B's rep_id (A). — szab.kel, Aug 13 '14 at 08:15

score -2 · Answer 1 · answered Aug 16 '14 at 02:50

It is not clear what you are trying to do, and it is not clear that you understand how to think & design relationally. But you seem to want rows satisfying "[id] is a member of the set named by member [rep_id]".

Stop thinking in terms of representations and pointers. Just find fill-in-the-(named-)blank statements ("predicates") that say what you know about your application situations and that you can combine to ask about your application situations. Every statement gets a table ("relation"). The columns of the table are the names of the blanks. The rows of the table are the ones that make its statement true. A query has a statement built from its table's statements. The rows of its result are the ones that make its statement true. (When a query has JOIN of table names its statement ANDs the tables' statements. UNION ORs them. EXCEPT puts in AND NOT. WHERE ANDs a condition. Dropping a column by SELECT corresponds to logical EXISTS.)

Maybe your application situations are a bunch of cells with values and pointers. But I suspect that your cells and pointers and connections and attaching and inserting are just your way of explaining & justifying your table design. Your application seems to have something to do with sets or partitions. If you really are trying to represent relations then you should understand that a relational table represents (is) a relation. Regardless, you should determine what your table statements are. If you want design help or criticism tell us more about your application situations, not about representation of them. All relational representation is by tables of rows satisfying statements.

Do you really need to name sets by representative elements? If we don't care what the name is then we typically use a "surrogate" name that is chosen by the DBMS, typically via some integer auto-increment facility. A benefit of using such a membership-independent name for a set is that we don't have to rename, in particular by choosing an element.

MySQL: Data structure for transitive relations

1 Answers1