Many to many self reference without id numbered

Question

I want to make a relationship between two Tags entities, but I don't like the typical way it is handled in the RDBMS databases.

Like here: https://stackoverflow.com/a/35784048/1624397

INSERT INTO RECOMMENDED_BOOKS (Book_id1, Book_id2) VALUES (1, 2)
INSERT INTO RECOMMENDED_BOOKS (Book_id1, Book_id2) VALUES (1, 3)

Book_id1, Book_id2...

Or another "bad" example I'm looking for an alternative to (which makes sense in this case, anyway):

Self-referencing to a User friendsWithMe and myFriends.

If I do something like tag_id1 and tag_id2 I either will be forced to search for whether there is a relation between both twice, or be forced to keep redundant data.

Is there any alternative solution?

Preferably the solution was storage-agnostic.

Postgres specific, but some alternatives are also applicable for other DBMS: http://www.databasesoup.com/2015/01/tag-all-things.html — , Jan 07 '18 at 15:27
I suggest you look into graph databases to store such data. They're built for this and will easily handle that kind of problem. Also allowing to efficiently query complex relationships that would choke relational database. — Justinas Marozas, Jan 07 '18 at 15:34
@a_horse_with_no_name Thanks, I had a similar solution (array, json) for this project when it was on NoSQL. But now, too many relations were required and forced us to move to RDMBS, because of maintainance and performance. — forsberg, Jan 07 '18 at 15:36
@JustinasMarozas Thanks, that what I was thinking before starting the project. However, after analyzing cons and pros, I concluded using Graph DB just because of this single non-key (though indeed frequently used) feature wouldn't be a sufficient argument for changing everything. — forsberg, Jan 07 '18 at 15:39
Please check my answer [here](https://stackoverflow.com/questions/47881198/how-to-design-a-relational-database-for-associating-multiple-tags-with-id/47916276#47916276) only for TAG management, it is not mature, but can be very good for starting new solutions. — Gholamali Irani, Jan 07 '18 at 18:40
Your question is not clear. It is just a series of fragments of descriptions & examples. Please edit your question to explain what 'bad' means. (Putting things in quotes doesn't clarify. Clarifying clarifies.) You do give an example near the end, by you don't explain what you mean by twice vs redundant. Whatever is in that link, say what that is and why it is relevant to whatever you mean by 'bad'. Questions should be self-contained. — philipxy, Jan 08 '18 at 02:22

score 1 · Answer 1 · answered Jan 07 '18 at 19:07

If I understand correctly, you have a problem with symmetric relationships since there are two ways to represent any pair of associated tags. Recording both ways result in redundant data, e.g. (1, 2) represents the same relationship as (2, 1). Recording only one of the two, without a symmetry-breaking rule, requires more complex queries, e.g. WHERE (tag_id1, tag_id2) IN ((1, 2), (2, 1)).

The trick is to introduce a symmetry-breaking rule, e.g. tag_id1 <= tag_id2. When inserting / updating data, you have to enforce the rule. That's easy if your DBMS supports check constraints, if not, you can consider using a trigger to do the same.

This simplifies queries - you can sort the arguments you want to search for so that you only have to search for a single permutation, e.g. (1, 2).

Perhaps one day we'll have DBMSs with optimized storage engines for symmetric relationships, trees, and so on.

You can easily create a constraint that prevents inserting `(2,1)` is `(1,2)` is already present — , Jan 07 '18 at 21:21

score 0 · Answer 2 · answered Jan 07 '18 at 20:20

I'm not aware of a way to solve many-to-many table without data redundancy and have simple queries in relational databases.

You could cheat and create a view that duplicates the data on query time, and it would look something like this:

CREATE VIEW VW_Friends
AS
SELECT PersonID, FriendID
FROM Friends
UNION
SELECT FriendID, PersonID
FROM Friends

I believe that would be slow and not-very-intuitive and I wouldn't generally recommend it, but it is a possible solution.

In your place I would go with redundant data, because that would be optimized for SELECTing data and in most cases table like this will have many more reads than writes.

If that is not the case and you have more writes than reads - don't duplicate the data and have awkward SELECTs with queries on both columns.

I hope this helps.

Many to many self reference without id numbered

2 Answers2