183

I'm a software developer. I love to code, but I hate databases... Currently, I'm creating a website on which a user will be allowed to mark an entity as liked (like in FB), tag it and comment.

I get stuck on database tables design for handling this functionality. Solution is trivial, if we can do this only for one type of thing (eg. photos). But I need to enable this for 5 different things (for now, but I also assume that this number can grow, as the whole service grows).

I found some similar questions here, but none of them have a satisfying answer, so I'm asking this question again.

The question is, how to properly, efficiently and elastically design the database, so that it can store comments for different tables, likes for different tables and tags for them. Some design pattern as answer will be best ;)

Detailed description: I have a table User with some user data, and 3 more tables: Photo with photographs, Articles with articles, Places with places. I want to enable any logged user to:

  • comment on any of those 3 tables

  • mark any of them as liked

  • tag any of them with some tag

  • I also want to count the number of likes for every element and the number of times that particular tag was used.

1st approach:

a) For tags, I will create a table Tag [TagId, tagName, tagCounter], then I will create many-to-many relationships tables for: Photo_has_tags, Place_has_tag, Article_has_tag.

b) The same counts for comments.

c) I will create a table LikedPhotos [idUser, idPhoto], LikedArticles[idUser, idArticle], LikedPlace [idUser, idPlace]. Number of likes will be calculated by queries (which, I assume is bad). And...

I really don't like this design for the last part, it smells badly for me ;)


2nd approach:

I will create a table ElementType [idType, TypeName == some table name] which will be populated by the administrator (me) with the names of tables that can be liked, commented or tagged. Then I will create tables:

a) LikedElement [idLike, idUser, idElementType, idLikedElement] and the same for Comments and Tags with the proper columns for each. Now, when I want to make a photo liked I will insert:

typeId = SELECT id FROM ElementType WHERE TypeName == 'Photo'
INSERT (user id, typeId, photoId)

and for places:

typeId = SELECT id FROM ElementType WHERE TypeName == 'Place'
INSERT (user id, typeId, placeId)

and so on... I think that the second approach is better, but I also feel like something is missing in this design as well...

At last, I also wonder which the best place to store counter for how many times the element was liked is. I can think of only two ways:

  1. in element (Photo/Article/Place) table
  2. by select count().

I hope that my explanation of the issue is more thorough now.

Angel Politis
  • 10,955
  • 14
  • 48
  • 66
Kokos
  • 2,143
  • 3
  • 17
  • 16

7 Answers7

237

The most extensible solution is to have just one "base" table (connected to "likes", tags and comments), and "inherit" all other tables from it. Adding a new kind of entity involves just adding a new "inherited" table - it then automatically plugs into the whole like/tag/comment machinery.

Entity-relationship term for this is "category" (see the ERwin Methods Guide, section: "Subtype Relationships"). The category symbol is:

Category

Assuming a user can like multiple entities, a same tag can be used for more than one entity but a comment is entity-specific, your model could look like this:

ER Diagram


BTW, there are roughly 3 ways to implement the "ER category":

  • All types in one table.
  • All concrete types in separate tables.
  • All concrete and abstract types in separate tables.

Unless you have very stringent performance requirements, the third approach is probably the best (meaning the physical tables match 1:1 the entities in the diagram above).

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • 3
    great answer, thank you. I hope, I will manage to implement it... and I wonder how Django ORM will handle to map it (or how I will do that by myself...but, that is the other problem ;) ) But, can you explain me, cause I think I do not understand it properly - what you have drawn for me (thanks!) is the third approach you mentioned? – Kokos Nov 13 '11 at 17:05
  • 3
    @Kokos Essentially, the approach (3) means that ENTITY is a table, PHOTO is a table, ARTICLE is a table and PLACE is a table. The approach (2) would mean there is **no** table for ENTITY and the approach (1) would mean there is only one table. The existence of all these approaches (all with their strengths and weaknesses) is the unfortunate consequence of the fact that a typical RDBMS does not support table inheritance natively. – Branko Dimitrijevic Nov 13 '11 at 17:19
  • @Branko This is excellent.. One question.. I'm confused... You have ENTITY table which is linked to all photos, articles (tables) etc... I need to use GUID data type for PK Entity ID in ENTITY and all categories because for example you can have Photo with ID 1 and Place with ID 1? Or I missed the point.. tnx... – rjovic Jan 05 '12 at 15:07
  • @rjovic Well, it doesn't really matter what type is used for ENTITY_ID, as long as it is unique at the level of all subtypes and not just one subtype. GUIDs by their nature fulfill that criteria, but for integers, this would mean you can't have PHOTO 1 **and** ARTICLE 1, but you could have the PHOTO 1 and ARTICLE 2. If overlapping subtype-specific key values are important for some reason, you could always create a subtupe-specific alternate key (i.e. UNIQUE constraint). Or, you could add a type discriminator to the PK (e.g. {ENTITY_TYPE, ENTITY_ID}), if you don't mind some redundancy. – Branko Dimitrijevic Jan 05 '12 at 21:48
  • how about if one of photos need to be profile picture of user? let assume user is the owner of photo entity. What's the best approach? – Fatih Donmez Feb 03 '13 at 17:14
  • 1
    @tylerdurden Probably the simplest way is to make a FOREIGN KEY in USER referencing PHOTO. This would work independently from whether user is already connected to the photo via LIKED_ENTITY. If you want to ensure user's photo must **also** be linked through LIKED_ENTITY - well this is a different "can of worms" and not easy to enforce purely through declarative means. – Branko Dimitrijevic Feb 03 '13 at 19:39
  • @BrankoDimitrijevic let say you have foreign key owner_id in entity referencing user. photo is also an entity, then having entity_id in user as a foreign key referencing photo is correct? – Fatih Donmez Feb 04 '13 at 07:08
  • @tylerdurden I don't know, depends on what you want to accomplish. I don't see an obvious reason why you couldn't have both, if that's what you're asking. – Branko Dimitrijevic Feb 04 '13 at 10:57
  • 1
    Thank you for this great explanation. Howerver, is there an efficient way to get all user's liked or commented objects, i.e. I can easily get all entitiy ids but then I should run one query for photo, one for article, place etc. searching for the same group of entity ids inside each table? Doesn't seem to me an efficient way to do that. Any suggestions? – Fabio Cionini Dec 10 '13 at 14:15
  • @fabio.cionini You don't need to JOIN with ENTITY - you can JOIN directly with PHOTO (or ARTICLE or PLACE), so this model is not less efficient for querying than "classical" model with independent tables. – Branko Dimitrijevic Dec 10 '13 at 14:41
  • @BrankoDimitrijevic how do you ensure that the PK from ENTITY_ID which will be an Identity Specification is inserted into the Entity table? Is this done on the application side rather than DB side? – volume one Sep 16 '14 at 14:27
  • @volumeone This is part of the procedural logic. You first insert ENTITY and get the auto-generated ENTITY_ID, and then reuse the same value for FKs that reference it. The "procedural logic" can be a series of client calls, or a stored procedure (and anything in between). – Branko Dimitrijevic Sep 16 '14 at 15:38
  • 2
    @BrankoDimitrijevic Why can't the entity tables Photo, Article, Place have their own PK e.g. PhotoID, ArticleID etc but also have another column for the Entity_ID as a FK? Is this unnecessary? – volume one Sep 17 '14 at 21:03
  • 1
    @volumeone They can, but is unnecessary. – Branko Dimitrijevic Sep 17 '14 at 22:44
  • @BrankoDimitrijevic What happens if I have lots of inherited tables? `BIGINT` may be not big enough for `ENTITY_ID`, isn't it? – Lewis Oct 16 '14 at 11:53
  • 4
    @Orion The maximum for `BIGINT` is 9223372036854775807. Assuming you insert one row each second, you will run-out of available values in ~300 billion years. Surely, you will be able to port to 128-bit integers by then! – Branko Dimitrijevic Oct 16 '14 at 13:14
  • @BrankoDimitrijevic I like your statistics, and also your `128 bit GIANTINT`. BTW, since `ENTITY` table is just for IDs, why don't you use `GUID` instead? – Lewis Oct 16 '14 at 14:19
  • 1
    @Orion GUIDs are larger (hurting storage and caching), usually fill B-Trees to a lesser degree (again wasting space and hurting cache) and don't play well with clustering (table clustering not database clustering). They have their place, but probably not here. – Branko Dimitrijevic Oct 16 '14 at 15:25
  • Assume that `COMMENT`,`ARTICLE` and `PLACE` has photos, will you still store photos in a single table or separate them into three respective ones? – Lewis Jun 26 '15 at 15:49
  • @Tresdin That's really a different problem, orthogonal to inheritance. Tell me more about how you intend to use these photos, and I might be able to tell you more about how to store them... – Branko Dimitrijevic Jun 29 '15 at 04:01
  • @BrankoDimitrijevic Those photos are initially commentable and likeable. But I also want to make them shareable and bookmarkable in the future. – Lewis Jun 29 '15 at 04:28
  • 1
    @Tresdin Then it looks like you need to keep the hierarchy above, and link from other tables to photos appropriately... As you add support for more features at the entity level, they will "propagate" to photos. – Branko Dimitrijevic Jun 29 '15 at 06:05
  • @BrankoDimitrijevic - Just to get things clear. If I were to insert a *new* article, I would create a new entity for that one article right? That means that the article would be tied to that one `entityId` – James111 Apr 11 '16 at 06:28
  • @James111 Yes. You first insert a new ENTITY row (which somehow generates a new ENTITY_ID, usually through auto-incremented field, or sequence or whatever mechanism is available in your DBMS). You then use the _same_ ENTITY_ID for a new ARTICLE row. – Branko Dimitrijevic Apr 11 '16 at 08:32
  • @BrankoDimitrijevic 1-Will above structure support Facebook like comments(Hierarchical Data Structure) ? 2- If yes how to implement it in MySQL because it does not support Hierarchical Data Structure? – Subodh Joshi Sep 26 '16 at 17:02
  • @BrankoDimitrijevic What does it mean `COMMENT_NO` in `ENTITY_COMMENT` table? Is it a primary key of commented entity? – Matt Oct 10 '17 at 08:21
  • @Matt It's a **part** of the composite primary key (together with ENTITY_ID). By convention, attributes above the horizontal line in the entity box form the primary key. – Branko Dimitrijevic Oct 10 '17 at 09:13
  • Great table design. The only (semantic) issue I'm having with it, it implementing it into a Symfony project structure. Not an issue exactly, I'm just struggling like hell to find a fitting alternative to the word `Entity` as `Entity` is used as a generic term to blanket all entities within the project, I don't want any confusion to be caused by using an Entity of the name Entity. Also the namespace App\Entity\Entity just feels wrong. Any suggestions? – Doug Dec 18 '17 at 19:28
  • Also I'm slightly confused by the N:N between Users and Entities. I understand a `User` can have `Many Entities`, however why would an `Entity` have many `Users`? – Doug Dec 18 '17 at 20:04
  • 1
    @Doug The "like" relationship is N:N because one entity can be liked by multiple users and one user can like multiple entities. As for naming, I can suggest terms such as "object", "item", "element" or even "post" or "contribution"... At the end of the day, you are the one to decide, based on what fits best with your specific project. – Branko Dimitrijevic Dec 19 '17 at 06:45
  • @Branko Dimitrijevic thank you for the clarification on the n:n table. As for the naming I put Entity into a thesaurus after I asked the question and after much clicking around ended up on element, which fits nicely. – Doug Dec 19 '17 at 06:47
  • Should the Entity table also have UserID as field? Or was Entity not created by the user in your example? – Just a coder Nov 18 '20 at 14:12
  • 1
    @FlowUI.SimpleUITesting.com Yes, if we know that _all_ types of entities will be created by a user (as opposed to by some automated process, for example). Otherwise, only those types of entities that are created by a user would have USER_ID. – Branko Dimitrijevic Nov 18 '20 at 19:11
  • I've just stumbled upon this answer. Can someone please explain to me why we are combining pk's and fk's in this way e.g. ```PK, FK1, PK, FK2``` in LIKED_ENTITY and ```PK, FK1, PK``` in ENTITY_COMMENT? There are similar such markings in other tables. – Radheya Jan 02 '21 at 15:45
  • @Dhruvify This is just how the tool that I used to produce the diagrams marks these constraints. In case of the LIKED_ENTITY, there is a composite primary key on ENTITY_ID and USER_ID, one foreign key on ENTITY_ID and another foreign key on USER_ID. You get the picture... – Branko Dimitrijevic Jan 04 '21 at 06:23
  • @BrankoDimitrijevic Oh okay. i got it this one. I am still wondering that won't it take too long every time tables get queried from multiple users? e.g. if there are one million users and if we have to show multiple likes and comments on posts in each of their timeline when homepage loads like fb, the website can crash right? or this architecture can still work flawlessly in such case? – Radheya Jan 04 '21 at 09:27
  • 1
    @Dhruvify A well optimized relational database can scale to millions of users - the StackOverflow itself runs on SQL Server. The key is in careful implementation and "performance is a feature" mentality. But the technical details of DB performance are far outside a simple comment like this. If you are interested, I'd recommend taking a look at https://use-the-index-luke.com/ – Branko Dimitrijevic Jan 04 '21 at 09:54
  • @BrankoDimitrijevic thanks for the link. I am trying to implement this model in my system but facing some problems. for e.g. let's say I want to get all likes on any photo with this architecture, can this pseudocode work: ```select * from liked_entity where entity_id='some_photo_entity_id' and user_id='some_user_entity_id' ``` or is there a better way? – Radheya Jan 07 '21 at 05:41
  • 1
    @Dhruvify If you want _all_ likes of a specific photo (not just one specific like), then omit `user_id='some_user_entity_id'`. – Branko Dimitrijevic Jan 07 '21 at 13:29
30

Since you "hate" databases, why are you trying to implement one? Instead, solicit help from someone who loves and breathes this stuff.

Otherwise, learn to love your database. A well designed database simplifies programming, engineering the site, and smooths its continuing operation. Even an experienced d/b designer will not have complete and perfect foresight: some schema changes down the road will be needed as usage patterns emerge or requirements change.

If this is a one man project, program the database interface into simple operations using stored procedures: add_user, update_user, add_comment, add_like, upload_photo, list_comments, etc. Do not embed the schema into even one line of code. In this manner, the database schema can be changed without affecting any code: only the stored procedures should know about the schema.

You may have to refactor the schema several times. This is normal. Don't worry about getting it perfect the first time. Just make it functional enough to prototype an initial design. If you have the luxury of time, use it some, and then delete the schema and do it again. It is always better the second time.

wallyk
  • 56,922
  • 16
  • 83
  • 148
  • 3
    Because I need to implement it by myself. At least for now... and, I thought that maybe it is a good occasion to start liking a databases a little bit ;) Thank you about your suggestion with stored procedure. Do someone know, if they are mapped by Django ORM automatically ? – Kokos Nov 13 '11 at 16:37
25

This is a general idea please don´t pay much attention to the field names styling, but more to the relation and structure

enter image description here

This pseudocode will get all the comments of photo with ID 5
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "comment"

This pseudocode will get all the likes or users who liked photo with ID 5
(you may use count() to just get the amount of likes)

SELECT * FROM actions  
WHERE actions.id_Stuff = 5  
AND actions.typeStuff="photo"  
AND actions.typeAction = "like"  
user964260
  • 307
  • 2
  • 7
  • I think you may even like comments, as, clicking a "like" link in a comment. This query will get the likes of a comment(action) with ID 133: `SELECT * FROM actions WHERE actions.id=133 AND actions.typeStuff = "comment" AND actions.typeAction = "like" ` – user964260 Nov 15 '11 at 03:35
  • 1
    I will definitely remember this solution for further releases of my system :) – Kokos Nov 19 '11 at 23:09
  • I have 2 stuff tables stuff1 and stuff2...I followed this diagram but there is sql error while using this...stuff1, stuff2 are two independent tables with their independent primary keys, and action table has a column id_stuff which is referencing to these two tabels stuff1, stuff2. Now for example stuff1 has 5 rows, stuff2 has 10 rows, when I try to add row in action table with id_stuff anything less than 5 lets say '3' it executes query because there exist a row with id_stuff '3' in both stuff1 and stuff2, but if I try to add row with id_stuff greater than 5 ...(continue to next comment) – vikas devde Apr 10 '13 at 11:22
  • and less than 10 lets say '7' it shows error because id_stuff in action table is referencing id_stuff in stuff1,stuff2 and even though stuff2 has a row with stuff_id '7', stuff1 doen't have it, and as it is referencing to stuff1 also, it creates an error, how can I solve this problem? – vikas devde Apr 10 '13 at 11:23
  • 2
    If one is to implement likes this way, it makes notifying the user of the new likes more difficult. It would require another table. – Greg L Mar 07 '14 at 21:05
  • 4
    How will the `id_stuff` column contain unique values in each of three tables? – volume one Sep 16 '14 at 14:34
  • @volumeone If I understand what you're saying, I think a combination of `id_stuff` and `typeStuff` will make up the table key in that case. – aderchox Jan 25 '19 at 21:06
2

Consider using table per entity for comments and etc. More tables - better sharding and scaling. It's not a problem to control many similar tables for all frameworks I know.

One day you'll need to optimize reads from such structure. You can easily create agragating tables over base ones and lose a bit on writes.

One big table with dictionary may become uncontrollable one day.

Oroboros102
  • 2,214
  • 1
  • 27
  • 41
  • More tables means it will be less maintainable. Individual tables can be sharded by most d/bs. – wallyk Aug 02 '17 at 21:20
1

as far as i understand. several tables are required. There is a many to many relation between them.

  • Table which stores the user data such as name, surname, birth date with a identity field.
  • Table which stores data types. these types may be photos, shares, links. each type must has a unique table. therefore, there is a relation between their individual tables and this table.
  • each different data type has its table. for example, status updates, photos, links.
  • the last table is for many to many relation storing an id, user id, data type and data id.
erencan
  • 3,725
  • 5
  • 32
  • 50
1

Look at the access patterns you are going to need. Do any of them seem to made particularly difficult or inefficient my one design choice or the other?

If not favour the one that requires the fewer tables

In this case:

  1. Add Comment: you either pick a particular many/many table or insert into a common table with a known specific identifier for what is being liked, I think client code will be slightly simpler in your second case.
  2. Find comments for item: here it seems using a common table is slightly easier - we just have a single query parameterised by type of entity
  3. Find comments by a person about one kind of thing: simple query in either case
  4. Find all comments by a person about all things: this seems little gnarly either way.

I think your "discriminated" approach, option 2, yields simpler queries in some cases and doesn't seem much worse in the others so I'd go with it.

djna
  • 54,992
  • 14
  • 74
  • 117
0

Definitely go with the second approach where you have one table and store the element type for each row, it will give you a lot more flexibility. Basically when something can logically be done with fewer tables it is almost always better to go with fewer tables. One advantage that comes to my mind right now about your particular case, consider you want to delete all liked elements of a certain user, with your first approach you need to issue one query for each element type but with the second approach it can be done with only one query or consider when you want to add a new element type, with the first approach it involves creating a new table for each new type but with the second approach you shouldn't do anything...

nobody
  • 10,599
  • 4
  • 26
  • 43