24

My website has a followers/following system (like Twitter's). My dilemma is creating the database structure to handle who's following who.

What I came up with was creating a table like this:

 id  |  user_id  |  followers |  following
  1  |    20     |  23,58,84  |  11,156,27
  2  |    21     |  72,35,14  |  6,98,44,12
 ... |   ...     |    ...     |     ...

Basically, I was thinking that each user would have a row with columns for their followers and the users they're following. The followers and people they're following would have their user id's separated by commas.

Is this an effective way of handling it? If not, what's the best alternative?

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Burrows
  • 475
  • 2
  • 8
  • 18

4 Answers4

64

That's the worst way to do it. It's against normalization. Have 2 seperate tables. Users and User_Followers. Users will store user information. User_Followers will be like this:

id | user_id | follower_id
1  | 20      | 45
2  | 20      | 53
3  | 32      | 20

User_Id and Follower_Id's will be foreign keys referring the Id column in the Users table.

regulus
  • 1,572
  • 12
  • 18
  • Yeah, that was the second structure I came up with but thought it wasn't conventional to do that. Thanks. – Burrows Nov 01 '13 at 19:44
  • 5
    @Vad I sure do not work for Oracle :) Normalization is a concept of relational DB systems and data, it won't be something for past as long as these systems live :) – regulus Nov 20 '15 at 08:57
  • 2
    @regulus Can you explain why we need an id column here? – Nuwan Karunarathna Sep 06 '16 at 13:24
  • 11
    @Nuwannnz we don't need an Id column, just a habit. a composite key of user and follower id's are totally ok as a primary key. – regulus Sep 29 '16 at 22:30
  • @regulus Hi. Can you explain how can we calculate if follow/follow back /friend like of Instagram's using this modelling method. – Nirjal Paudel Jun 27 '21 at 15:19
19

There is a better physical structure than proposed by other answers so far:

CREATE TABLE follower (
    user_id INT, -- References user.
    follower_id INT,  -- References user.
    PRIMARY KEY (user_id, follower_id),
    UNIQUE INDEX (follower_id, user_id)
);

InnoDB tables are clustered, so the secondary indexes behave differently than in heap-based tables and can have unexpected overheads if you are not cognizant of that. Having a surrogate primary key id just adds another index for no good reason1 and makes indexes on {user_id, follower_id} and {follower_id, user_id} fatter than they need to be (because secondary indexes in a clustered table implicitly include a copy of the PK).

The table above has no surrogate key id and (assuming InnoDB) is physically represented by two B-Trees (one for the primary/clustering key and one for the secondary index), which is about as efficient as it gets for searching in both directions2. If you only need one direction, you can abandon the secondary index and go down to just one B-Tree.

BTW what you did was a violation of the principle of atomicity, and therefore of 1NF.


1 And every additional index takes space, lowers the cache effectiveness and impacts the INSERT/UPDATE/DELETE performance.

2 From followee to follower and vice versa.

Community
  • 1
  • 1
Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • what about following? – Just a coder Aug 28 '21 at 14:11
  • @Justacoder Sorry, I don't understand the question. – Branko Dimitrijevic Aug 29 '21 at 04:39
  • the questions was about followers and following. But it looks like your answer only addresses `followers`. So i was asking, what about `Following` ? – Just a coder Aug 29 '21 at 11:27
  • @Justacoder You can query both for "who are the followers of the given person" and "who the given person is following" from this model. The two indexes mentioned in the answer make both directions efficient. – Branko Dimitrijevic Aug 30 '21 at 05:50
  • 2
    @BrankoDimitrijevic what is the point of the 2nd index; UNIQUE() when you have the PRIMARY KEY? – Justin Oct 09 '21 at 17:57
  • @Justin It allows for rapid search from `follower_id` to `user_id`. In other words, it helps performance when querying for "who the given person is following". – Branko Dimitrijevic Oct 10 '21 at 12:52
  • @BrankoDimitrijevic you have already foreign keys. i don't think it worth additional space. Do you have any bench data or info? – jsHate Aug 10 '22 at 06:54
  • @jsHate What do you mean by "you have already foreign keys"? Could you please clarify. – Branko Dimitrijevic Aug 10 '22 at 11:28
  • @BrankoDimitrijevic you have pointed references user. I assume we add foreign key to columns. In mysql and postgresql it auto creates index. – jsHate Aug 10 '22 at 19:24
  • @BrankoDimitrijevic and one more thing about surrogate key. Most of the time I want preserve order of followers. And besides I want data to be written in sequence without random jumping around. So I add auto increment id column for peace of mind. – jsHate Aug 10 '22 at 19:27
  • @BrankoDimitrijevic one more question/ suggestion. All this follower/following/followed users can make headache. Don't you think naming user_id_from and user_id_to or source_user_id and target_user_id as better naming to follower / followed. – jsHate Aug 10 '22 at 19:32
  • 1
    @jsHate I'll try to answer your questions in order... (1) Yes, foreign keys should exist in this table, as indicated by the comments in my CREATE TABLE statement. (2) In principle, a FK can exist without an index. For example, both MS SQL Server and Oracle allow index-less FKs, which may be a good idea if you never try to delete the parent (i.e. referenced) row. But I haven't worked in MySQL in years, so I'm a bit fuzzy on how it works there exactly. – Branko Dimitrijevic Aug 11 '22 at 16:11
  • (3) Additional fields can exist in this table, including a field that defines position. But the table should probably not be clustered on it. (4) "Writing in sequence" may or may not be desired. For example, writing new rows to the same page (the "right-most" leaf of the B-tree) may cause contention. (5) Naming is hard and depends on the context. – Branko Dimitrijevic Aug 11 '22 at 16:11
6

One weakness of that representation is that each relationship is encoded twice: once in the row for the follower and once in the row for the following user, making it harder to maintain data integrity and updates tedious.

I would make one table for users and one table for relationships. The relationship table would look like:

id | follower | following
1  | 23       | 20
2  | 58       | 20
3  | 84       | 20
4  | 20       | 11
...

This way adding new relationships is simply an insert, and removing relationships is a delete. It's also much easier to roll up the counts to determine how many followers a given user has.

couchand
  • 2,639
  • 1
  • 21
  • 27
3

No, the approach you describe has a few problems.

First, storing multiple data points as comma-separated strings has a number of issues. It's difficult to join on (and while you can join using like it will slow down performance) and difficult and slow to search on, and can't be indexed the way you would want.

Second, if you store both a list of followers and a list of people following, you have redundant data (the fact that A is following B will show up in two places), which is both a waste of space, and also creates the potential of data getting out-of-sync (if the database shows A on B's list of followers, but doesn't show B on A's list of following, then the data is inconsistent in a way that's very hard to recover from).

Instead, use a join table. That's a separate table where each row has a user id and a follower id. This allows things to be stored in one place, allows indexing and joining, and also allows you to add additional columns to that row, for example to show when the following relationship started.

Jacob Mattison
  • 50,258
  • 9
  • 107
  • 126