23

I'm a newbie to SQL and I'm jumping in head first trying to learn as much as possible as I'm coding, which is difficult as I'm designing the database I'll have to live with for a while so I want to make sure I do it right. I learned the basics of the Many-to-many bridge tables, but what if the two fields are the same type? Let's say a social network with thousands of users, and how would you create a table to keep track of who is friends with who? What if there is additional data about each relationship, say... "date friended" for example. Knowing that there will be queries like "show all friends of userX friended between dateY and dateZ". My database would have several situations like this and I can't figure out an efficient way to do it. Since it's coming up a lot for me, I figure others have figured out the best way to design the tables, right?

Dan Goodspeed
  • 3,484
  • 4
  • 26
  • 35
  • 1
    I don't understand what you mean by, "what if the two fields are the same type?" You can still have a join table and add additional attributes to it. – OldProgrammer Sep 04 '13 at 00:14
  • I mean, when I learned many-to-many in context of users and bands, where users could "favorite" bands. So I was told to make a users-bands bridge table, where one column is users, and the other column is "favorited bands". Two different types of data. In this "same type" method, both columns would be the same type ("users")... so... I'd have to include each relationship twice, and all the metadata for the relationship twice. Does that make sense? – Dan Goodspeed Sep 04 '13 at 00:29
  • 1
    The "type" of objects (or rows) is not an important consideration when considering a many-to-many relationship. Here is the best way to think of it: You have two tables, Users and Groups. A User may belong to many Groups (a one-to-many) ... but a Group may also be comprised of many Users (also a one-to-many). Therefore, a many-to-many relationship may be thought of as a bidirectional one-to-many relationship. The bridge table is the structure that mediates this relationship for you. A User is not a Group, but even though the types are different they still may possess this relationship. – scottb Sep 04 '13 at 00:34
  • I'm not sure I follow. How would the groups table be designed without redundancy? And my likely follow-up question: How would queries work on it? – Dan Goodspeed Sep 04 '13 at 00:38
  • So this is all evidence as to why social networks utilise graph databases instead of relational databases. Recommend looking at links off the following http://en.wikipedia.org/wiki/Graph_database – SteveChapman Sep 04 '13 at 01:06
  • 1
    Basically he's asking about how you should model a symmetrical many-to-many relationship. – RBarryYoung Sep 04 '13 at 01:19
  • 1
    Have a look at this thread from the dba stackexchange: http://dba.stackexchange.com/questions/10199/how-should-i-design-a-relationship-table-for-friendship Also take a look at this sample lesson from 9Lessons for more ideas: http://www.9lessons.info/2010/04/database-design-create-tables-and.html – Staccato Dec 28 '13 at 02:17

4 Answers4

35

Create a User table then a Relationships table where you store the id of the two friend and any kind of information about their relationship.

SQL diagram

SQL diagram

MySQL code

CREATE TABLE `Users` (
  `id` TINYINT NOT NULL AUTO_INCREMENT DEFAULT NULL,
  PRIMARY KEY (`id`)
);

CREATE TABLE `Relationships` (
  `id` TINYINT NOT NULL AUTO_INCREMENT DEFAULT NULL,
  `userid` TINYINT NULL DEFAULT NULL,
  `friendid` TINYINT NULL DEFAULT NULL,
  `friended` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
);

ALTER TABLE `Relationships` ADD FOREIGN KEY (userid) REFERENCES `Users` (`id`);
ALTER TABLE `Relationships` ADD FOREIGN KEY (friendid) REFERENCES `Users` (`id`);

SQL selection

After you fill up the tables with data, you can create your SQL SELECT query to get all of your friends. Your friends are those whose id is in one side side while your id is in the other side. You check both sides for your id so you don't need to store relationships twice. Also you have to exclude your id, because you can't be your own friend (in a normal, healthy world).

SELECT *
FROM Users u
   INNER JOIN Relationships r ON u.id = r.userid
   INNER JOIN Relationships r ON u.id = r.friendid
WHERE
   (r.userid = $myid OR r.friendid = $myid)
   AND r.friended >= $startdate
   AND r.friended <= $enddate
   AND u.id != $myid;

Where $myid, $startdate and $enddate can be PHP variables so in double quotes you can pass this code directly to your database.

totymedli
  • 29,531
  • 22
  • 131
  • 165
  • So you're saying store every userA/userB pair twice, once with userA in the userid column, and once with userA in the friendid column? And then store the information in the friended column twice as well? – Dan Goodspeed Sep 04 '13 at 00:43
  • 1
    Extended the SQL query. Now you don't need to store duplications. – totymedli Sep 04 '13 at 01:05
  • This is the first answer that I've seen that might do it, though I'm still somewhat new at all the queries so I'm not 100% sure I follow that SELECT. Are you selecting from "Users u" (I believe that's making u an alias to Users?) so the result will be the full row of user data from the Users table? If I just wanted the list of $myid's friends and their friendDate, the SELECT query wouldn't even have to mention the Users table, right? – Dan Goodspeed Sep 04 '13 at 01:44
  • "Where $myid, $startdate and $enddate can be PHP variables" - be careful regarding SQL injection, either have some heavy sanitisation/escaping on the variables, or preferably use a library such as PDO or MySQLi (in PHP) to do this for you. For more info, see: http://stackoverflow.com/questions/60174/how-can-i-prevent-sql-injection-in-php – Liam Gray Nov 01 '15 at 22:27
4

A single linking table suffices, like so:

People( PersonId bigint, Name nvarchar, etc )
Friends( FromPersonId bigint, ToPersonId bigint, DateAdded datetime )

Example queries:

Who is friends with me? (i.e. people who have added me as a friend, but not necessarily reciprocated)

SELECT
    People.Name
FROM
    Friends
    INNER JOIN People ON Friends.FromPersonId = People.PersonId
WHERE
    Friends.ToPersonId = @myPersonId

Who added me between two dates?

SELECT
    People.Name
FROM
    Friends
    INNER JOIN People ON Friends.FromPersonId = People.PersonId
WHERE
    Friends.ToPersonId = @myPersonId
    AND
    Friends.DateAdded >= @startDate
    AND
    Friends.DateAdded <= @endDate
Rotem
  • 21,452
  • 6
  • 62
  • 109
Dai
  • 141,631
  • 28
  • 261
  • 374
  • I guess I wasn't clear. There will be no "from" and "to" with the users (or the other similar tables I'm working with). Maybe instead picture it as zip codes instead of users. Every zip code has a distance value associated with every other zip code. It doesn't matter if you're zipA->zipB, or zipB->zipA. It's the same value. Does that explain better the sort of data structure I'm looking to design? – Dan Goodspeed Sep 04 '13 at 00:34
  • What you're proposing (a table of zipcodes with distances to each other) is an incorrect design because it will use `n^2` space (i.e. a table of 10 zip codes will require 100 distance rows, 100 zip codes will require 10,000 distances, and so on). Instead, consider using geospatial types if your DBMS supports it. Both MS-SQL and MySQL support geospatial queries. – Dai Sep 04 '13 at 00:44
  • Geospatial queries doesn't take into account the fact there's a big lake in between two locations that someone would have to drive around, so that doesn't work. I have the road distances as data, and didn't plan on storing every match, just the closest, but that's getting off on a tangent from my original question. – Dan Goodspeed Sep 04 '13 at 00:54
4

The Model

A model stretching over three tables would be an option. You'd have the obvious table user with all the user's specifics (Name, Date of Birth, ...).

CREATE TABLE `user` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(45),
  `dob` DATE,
  PRIMARY KEY (`id`)
);

Second, a table connection of the user could contain privileges granted to a connection (can this connection see my photo album), and importantly, would refer to a table friendship. We need the table connection in between because one user can be connected into many friendships.

CREATE TABLE `connection` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `user_id` INT NOT NULL,
  `friendship_id` INT NOT NULL,
  `privilege_mask` TINYINT,
  PRIMARY KEY (`id`)
);

friendship in turn could include shared details like when this friendship was established. Users that are connected to the same friendship are friends.

CREATE TABLE `friendship` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `met_first_time` DATE,
  PRIMARY KEY (`id`)
);

This way would be a more realistic model than the other solutions posted so far in that it avoids directionality (unreciprocated friendship - which shouldn't exist!), but it would be a little more work to implement.

Query Friends

An example that should query the names of your friends, may be (although not tested):

SELECT B.name FROM user A
  INNER JOIN connection conn_A ON conn_A.user_id = A.id
    INNER JOIN connection conn_B ON conn_A.friendship_id = conn_B.friendship_id
    INNER JOIN user B ON conn_B.user_id = B.id
    INNER JOIN friendship ON friendship.id = conn_A.friendship_id
  WHERE A.name = 'Dan' AND A.id <> B.id AND
    friendship.met_first_time BETWEEN '2013-4-1' AND '2013-6-30';

You may notice that if you don't care about the date when you made friends, you don't need to JOIN to the friendship table because connections already share friendship_id keys. The essence of any such query would be the JOIN between conn_A and conn_B on conn_A.friendship_id = conn_B.friendship_id.

s.bandara
  • 5,636
  • 1
  • 21
  • 36
  • Maybe I chose I bad example of users/friendships because people have a preconception that it involves sending a friend request which needs to be accepted. There is no difference between the userA->userB or userB->userA. There's either a friendship connection or not. But if there is, there's metadata about it (date, etc). The other example I gave of a similar data structure is zip codes. Every zip code has a distance away from every other zip code. zipA->zipB=zipB->zipA. How should all those distances be stored without repeating? – Dan Goodspeed Sep 04 '13 at 00:48
  • 2
    Exactly: my point is that there is no directionality in friendship. This is why I think the model I outlined is appropriate for you. Two `user` entries can share the same `friendship`, i.e. each have a `connection` to that friendship. Shared metadata (your example: date) would be in the table `friendship`, and metadata of "directional" nature (e.g. privileges given from one user to the other) would be in the table `connection`. – s.bandara Sep 04 '13 at 02:18
  • Ok, I understand your model now. I'm just concerned visualizing what kind of select queries I'd have to write, being a beginner who would have a lot of trial and error. I know enough to know it's possible. :-) So in your model, what would the SELECT be for finding all friends of userX who met between dateY and dateZ? – Dan Goodspeed Sep 04 '13 at 03:11
  • 1
    Just added an example query to find the friends you made between beginning of April and end of June. – s.bandara Sep 04 '13 at 04:19
  • Thanks :) Though it's SELECT queries like those that really makes me question just how interested I am in learning SQL :-p – Dan Goodspeed Sep 04 '13 at 06:43
2

You should use the ID's of the two users as PRIMARY KEY in a relationship table (the relation would be unique even if not bi-directional). Something like that

CREATE TABLE Users ( id int(9) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (id) );

CREATE TABLE Relationships ( id1 int(9) NOT NULL, id2 int(9) NOT NULL, friended TIMESTAMP NOT NULL, PRIMARY KEY (id1, id2) );

Please note:

  • id1 and id2 reference Users table (the code above was very simplified)
  • even if the relationship is not "bi-directional" you can think that if you have id1 - id2 it seems that id1 user add id2 user as a friend but not necessarily the opposite - THEN id2 user can add id1 user as friend - in the table of relationships you can have these possible combinations:
    • id1-id2, (only 1 add 2 as a friend)
    • id2-id1, (only2 add 1 as a friend)
    • id1-id2 AND id2-id1 (both - it means 2 lines in relationships table, and both are MUTUAL friends)
bito_
  • 169
  • 2
  • 10