52

Dear database experts/programmers:

I have a mysql table with users information, such as

id      user_id         name etc.
1       userA            
2       userB   
3       userC
..      ..
..      ..

I want to create a feature like 'follow' other users something like twitter. A userA can follow userB, or userB can follow userA, or both can follow each other. For that purpose, should I create 1 table, lets say followers

id      user_id     follower_id
1           userA       userB
2           userC       userA
3           userA       userC
4           userB       userC
5           userX       userA

Now I want to find who is following a userA. I'd so sth like: Select * from followers where user_id = userA This will select userB and userC. Thats what I need.

Now I want to find, which persons userA is following (for example in above table, userA is following userC and userX. Then I should run something like Select * from followers where follower_id=userA.

My question is that, is this database design correct for this problem (considering in mind database redundancy and optimization?) Or there can be better approach than this? Thanks.

btilly
  • 43,296
  • 3
  • 59
  • 88
tcj
  • 611
  • 1
  • 7
  • 4
  • BTW, following**s** is wrong – Mohammed Noureldin Feb 12 '18 at 18:07
  • @tcj just curious to know what was your final design for this? Ideally one table with user_id and follower_id should solve this problem and the condition is we should make sure the entries should be unique by making them as primary key.. I mean CREATE TABLE follow(user_id INT, following_id INT, PRIMARY KEY(user_id, following_id), FOREIGN KEY (user_id) REFERENCES user (user_id), FOREIGN KEY (following_id) REFERENCES user (user_id), UNIQUE INDEX (following_id, user_id) ); – Pavan Sep 15 '20 at 12:15

4 Answers4

23

In general, your design is correct.

But, if user_id is unique in the table "users", you don't need the column "id" in "users". (A single table containing a unique "id" and a unique "user_id" is pretty unusual.) You also don't need the column "id" in the table "followers".

Primary key in "followers" should be (user_id, follower_id), and make sure each of those columns has a foreign key referencing "user_id" in "users".

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
  • 2
    Is the PK a combination of (user_id, follower_id) here? How do I query the table to get the followers for a user_id? because when I do, I may not have the followerId's. So, I am having trouble trying to understand how the combination of the both as PK will help. – xyzzz Jul 02 '20 at 06:29
  • 1
    @rond : If the OP follows my suggestion, then the PK of the table "followers" would consist of two columns: (user_id, follower_id). Query for followers of one user (let's say user 42) would look something like this: `select * from followers where user_id = 42;` – Mike Sherrill 'Cat Recall' Jul 02 '20 at 14:44
  • @MikeSherrill'CatRecall', isn't it faster to query the primary key due to indexing of primary keys? e.g. (user_id, follower_id combination)? – imagineerThat Apr 10 '21 at 22:45
  • @imagineerThat said, *"isn't it faster to query the primary key due to indexing of primary keys? e.g. (user_id, follower_id combination)?"* Not when you don't know the follower IDs. – Mike Sherrill 'Cat Recall' Apr 11 '21 at 14:52
  • @MikeSherrill'CatRecall', fair enough. It appears that querying by the first column in a multi-column key is still fast, as that first column is also indexed. So Your `select` statement would not necessitate a full table scan. Is that correct? – imagineerThat Apr 13 '21 at 03:17
  • 1
    @imagineerThat: Right. If you need to do frequent queries the "other direction" (`select * from followers where follower_id = 12;`), you might benefit from an additional index on follower_id. (Or, depending on your dbms and your application, on the pair `follower_id, user_id`. For example, PostgreSQL supports [index-only scans](https://www.postgresql.org/docs/13/indexes-index-only-scans.html). ) – Mike Sherrill 'Cat Recall' Apr 13 '21 at 11:42
13

Here is my design.

Id(PK)  userId(FK)  followerId(FK)  followedDate         unfollowedDate
1       123         456             YYYY-MM-DD HH:MI:SS  YYYY-MM-DD HH:MI:SS
...     ...         ...             ...                  ...

I assumed userId - followerId combiation is unique. I can use them as composite key and remove Id from table. Dates can also be useful. For example, if user unfollows and follows again in 5 minutes, it doesn't generate notification - I assume user made it by mistake. I can also analyze following statistics by date.

Iren Saltalı
  • 516
  • 7
  • 16
10

General tip. Use integers for ids rather than strings. There is a significant performance difference. So drop users.user_id, and rename users.id to users.user_id. Secondly your followers table should have indexes on user_id and follower_id. Again there is a significant performance benefit. I also like the idea of having a unique index on (user_id, follower_id), calling that your primary key, and dropping your id column.

btilly
  • 43,296
  • 3
  • 59
  • 88
  • 1
    I agree on your main point that integers are better as primary/foreign keys. Yet I think that what he has called `user_id` could be something like 'login name' (and should then be renamed properly, but not removed), a thing that both is needed and has to be unique within the system. Still the `follow` table should use integer references, just as you are suggesting. – Andriy M Feb 04 '11 at 15:46
3

Yes, your design is the usual way of dealing with many-to-many relationships. Search for "modeling many-to-many database" and you will find lots of resources giving you examples of this.

Add foreign keys from your relationship table to the users table.

If your relationship involves additional information, you would put that as column in your connecting table. Maybe, for instance, the date when one user started following another.

A separate surrogate key in the connecting table, like the ID column you have added, can be useful if you will want to have other tables reference your table.

Andreas Vendel
  • 716
  • 6
  • 14