0

I have a question regarding efficiency when storing a follower-following relationship in SQL databases.

I understand that ideally it would be good to store this by having a join table like this as suggested here Followers/following database structure by user Branko Dimitrijevic:

CREATE TABLE follower (
    user_id INT, -- References user.
    follower_id INT,  -- References user.
    PRIMARY KEY (user_id, follower_id),
    UNIQUE INDEX (follower_id, user_id)
);

However, I was wondering whether I could create a JSONB column instead that stores all the users one user is following like so:

CREATE TABLE follower (
    user_id INT, -- References user.
    following JSONB DEFAULT '{}'::JSONB, 
    PRIMARY KEY (user_id)
);

The idea is that the following column has the user ids of the users a particular user is following as keys. Then look ups should be easy as we can use SQL to check whether a user is following someone easily like so:

{
  'user_id_2': true
}

SELECT following -> 'user_id_2' from follower where user_id = xxx; or SELECT following ? 'user_id_2' from follower where user_id = xxx;

The above query would give true if the user xxx is following user_id_2. Basically I'm suggesting using the JSONB column type like a set.

What do you guys think? Would this be fast? Memory wise both approaches should be the same?

User sticky bit linked a proposed answer to my question but let me address why I think it's not applicable to what I'm saying. First I'm talking about a json object which is not an array. Now let me go through the points taken from the proposed solution one by one:

  1. Can’t ensure that each value is the right data type: no way to prevent 1,2,3,banana,5 -- Actually you can when you add a new key.
  2. Can’t use foreign key constraints to link values to a lookup table; no way to enforce referential integrity. -- Not necessary in my use case.
  3. Can’t enforce uniqueness: no way to prevent 1,2,3,3,3,5 -- duplicate keys would overwrite one another
  4. Can’t delete a value from the list without fetching the whole list. -- can delete easily
  5. Can't store a list longer than what fits in the string column. -- true, it has a size limit of 256 MB Size limit of JSON data type in PostgreSQL
  6. Hard to search for all entities with a given value in the list; you have to use an inefficient table-scan. May have to resort to regular expressions, for example in MySQL: idlist REGEXP '[[:<:]]2[[:>:]]' or in MySQL 8.0: idlist REGEXP '\b2\b' -- easy to search whether a key exists, I've added queries for it already
  7. Hard to count elements in the list, or do other aggregate queries. -- easy to count
  8. Hard to join the values to the lookup table they reference. -- true
  9. Hard to fetch the list in sorted order. -- inapplicable
  10. Hard to choose a separator that is guaranteed not to appear in the values -- inapplicable

I suppose the main answer is the size limit of 255 MB that exists on a JSONB column that is the main problem in my suggestion.

Ayudh
  • 1,673
  • 1
  • 22
  • 55
  • You *could*, but you *shouldn't*. One important ability you'd lose is to ensure referential integrity via simple and fast foreign key constraints. You'd need to implement that via complicated and likely slower triggers. Joins will be more difficult and probably slower, etc.. It's basically like storing a delimited list in a little more buffed way. Have a look at the dupe. – sticky bit Jan 06 '22 at 08:16
  • Well to be honest, it doesn't because I'm not storing a comma delimited list, I'm storing a key-value object which makes most of the points against storing lists inapplicable – Ayudh Jan 06 '22 at 08:31
  • Doesn't matter if the referenced IDs are keys in JSON. For the DBMS they most and foremost are some substrings in a larger string. Accessing them might be a little more neat as JSON operators are provided but still the most important disadvantages (which I also outlined in my first comment BTW) apply. (In relational databases, that is. You might actually want to look at No-SQL databases, if they better fit your needs.) – sticky bit Jan 06 '22 at 08:46
  • I'm closing the question as i've found my answer in the size limit that is on the JSONB column which is 255MB – Ayudh Jan 06 '22 at 08:47
  • Regarding your edit: No doubt you can (re)implement most if not all the features by more or less complex queries, triggers etc. and a little fraction may come out of the box as a side effect. But then you'd give up most of the power of a relational DBMS which already implements all the features *efficiently* and *well tested*. So the question then is, why would you even want to use a relational DBMS, if you actually don't want or need to use its capabilities? – sticky bit Jan 06 '22 at 08:54
  • To answer your question, I point you you to my original questions `What do you guys think? Would this be fast? Memory wise both approaches should be the same?` – Ayudh Jan 06 '22 at 09:09

0 Answers0