I have a question regarding efficiency when storing a follower-following relationship in SQL databases.
I understand that ideally it would be good to store this by having a join table like this as suggested here Followers/following database structure by user Branko Dimitrijevic:
CREATE TABLE follower (
user_id INT, -- References user.
follower_id INT, -- References user.
PRIMARY KEY (user_id, follower_id),
UNIQUE INDEX (follower_id, user_id)
);
However, I was wondering whether I could create a JSONB column instead that stores all the users one user is following like so:
CREATE TABLE follower (
user_id INT, -- References user.
following JSONB DEFAULT '{}'::JSONB,
PRIMARY KEY (user_id)
);
The idea is that the following column has the user ids of the users a particular user is following as keys. Then look ups should be easy as we can use SQL to check whether a user is following someone easily like so:
{
'user_id_2': true
}
SELECT following -> 'user_id_2' from follower where user_id = xxx;
or
SELECT following ? 'user_id_2' from follower where user_id = xxx;
The above query would give true if the user xxx is following user_id_2. Basically I'm suggesting using the JSONB column type like a set.
What do you guys think? Would this be fast? Memory wise both approaches should be the same?
User sticky bit linked a proposed answer to my question but let me address why I think it's not applicable to what I'm saying. First I'm talking about a json object which is not an array. Now let me go through the points taken from the proposed solution one by one:
- Can’t ensure that each value is the right data type: no way to prevent 1,2,3,banana,5 -- Actually you can when you add a new key.
- Can’t use foreign key constraints to link values to a lookup table; no way to enforce referential integrity. -- Not necessary in my use case.
- Can’t enforce uniqueness: no way to prevent 1,2,3,3,3,5 -- duplicate keys would overwrite one another
- Can’t delete a value from the list without fetching the whole list. -- can delete easily
- Can't store a list longer than what fits in the string column. -- true, it has a size limit of 256 MB Size limit of JSON data type in PostgreSQL
- Hard to search for all entities with a given value in the list; you have to use an inefficient table-scan. May have to resort to regular expressions, for example in MySQL: idlist REGEXP '[[:<:]]2[[:>:]]' or in MySQL 8.0: idlist REGEXP '\b2\b' -- easy to search whether a key exists, I've added queries for it already
- Hard to count elements in the list, or do other aggregate queries. -- easy to count
- Hard to join the values to the lookup table they reference. -- true
- Hard to fetch the list in sorted order. -- inapplicable
- Hard to choose a separator that is guaranteed not to appear in the values -- inapplicable
I suppose the main answer is the size limit of 255 MB that exists on a JSONB column that is the main problem in my suggestion.