5

I'm building a site similar to Yelp (Recommendation Engine, on a smaller scale though), so there will be three main entities in the system: User, Place (includes businesses), and Event.

Now what I'm wondering about is how to store information such as photos, comments, and 'compliments' (similar to Facebook's "Like") for each of these type of entity, and also for each object they can be applied to (e.g. comment on a recommendation, photo, etc). Right now the way I was doing it was a single table for each i.e.

Photo (id, type, owner_id, is_main, etc...)
where type represents: 1=user, 2=place, 3=event

Comment (id, object_type, object_id, user_id, content, etc, etc...)
where object_type can be a few different objects like photos, recommendations, etc

Compliment (object_id, object_type, compliment_type, user_id)
where object_type can be a few different objects like photos, recommendations, etc

Activity (id, source, source_type, source_id, etc..) //for "activity feed"
where source_type is a user, place, or event

Notification (id, recipient, sender, activity_type, object_type, object_id, etc...)
where object_type & object_id will be used to provide a direct link to the object of the notification e.g. a user's photo that was complimented

But after reading a few posts on SO, I realized I can't maintain referential integrity with a foreign key since that's requires a 1:1 relationship and my source_id/object_id fields can relate to an ID in more than one table. So I decided to go with the method of keeping the main entity, but then break it into subsets i.e.

User_Photo (photo_id, user_id) | Place_Photo(photo_id, place_id) | etc...

Photo_Comment (comment_id, photo_id) | Recommendation_Comment(comment_id, rec_id) | etc...

Compliment (id, ...) //would need to add a surrogate key to Compliment table now

Photo_Compliment(compliment_id, photo_id) | Comment_Compliment(compliment_id, comment_id) | etc...

User_Activity(activity_id, user_id) | Place_Activity(activity_id, place_id) | etc...

I was thinking I could just create views joining each sub-table to the main table to get the results I want. Plus I'm thinking it would fit into my object models in Code Igniter as well.

The only table I think I could leave is the notifications table, since there are many object types (forum post, photo, recommendation, etc, etc), and this table will only hold notifications for a week anyway so any ref integrity issues shouldn't be much of a problem (I think).

So am I going about this in a sensible way? Any performance, reliability, or other issues that I may have overlooked?

The only "problem" I can see is that I would end up with a lot of tables (as it is right now I have about 72, so I guess i would end up with a little under 90 tables after I add the extras), and that's not an issue as far as I can tell.

Really grateful for any kind of feedback. Thanks in advance.

EDIT: Just to be clear, I'm not concerned if i end up with another 10 or so tables. From what I know, the number of tables isn't too much of an issue (once they're being used)... unless you had say 200 or so :/

Community
  • 1
  • 1
Ray
  • 617
  • 8
  • 15

3 Answers3

6

Some propositions for this UoD (universe of discourse)

  • User named Bob logged in.
  • User named Bob uploaded photo number 56.
  • There is a place named London.
  • Photo number 56 is of place named London.
  • User named Joe created comment "very nice" on photo number 56.

To introduce object IDs

  • User (UserID) logged in.
  • User (UserID) uploaded Photo (PhotoID).
  • There is Place (PlaceID).
  • Photo (PhotoID) is of Place (PlaceID).
  • User (UserID) created Comment (CommentID) on Photo (PhotoID).

Just Fact Types

  • User logged in.
  • User uploaded Photo.
  • Place exists.
  • Photo is of Place.
  • User created Comment on Photo.

Now to extract predicates

Predicate               Predicate Arity
---------------------------------------------
... logged in            1 (Unary predicate)
... uploaded ...         2 (Binary)
... exists               1 (Unary) 
... is of ...            2 (Binary)
... created ... on ...   3 (Ternary)

It looks like each proposition is this UoD may be stated with max ternary predicate, so I would suggest something like

enter image description here

Predicate role (Role_1_ID, Role_2_ID, Role_3_ID) is a part that an object plays in a predicate. Substitute the ... in a predicate from left to right with each Role_ID. Note that only Role_1_ID is mandatory (at least unary predicate), the other two may be NULL.

In this simple model, it is possible to propose anything. Hence, you would need to implement constraints on the application layer. For example, you have to make sure that it is possible to create Comment on Place, but not create Place on Place. Not all predicates represents action, for example ... logged in is an action while ... is of ... is not. So, your activity feed would list all Propositions with Predicate.IsAction = True.

Damir Sudarevic
  • 21,891
  • 3
  • 47
  • 71
  • @Marius Burz; for example? `Proposition` table is many-to-many in between any two resources. – Damir Sudarevic Aug 31 '11 at 19:23
  • This is a pretty interesting OO approach, but I think it may be too generic for my case. It's not going to be a huge open system where many types of objects can be added (minor exceptions might be in the case of what can be commented on or complimented, but those are still limited anyways) – Ray Aug 31 '11 at 19:24
  • @Ray, the smaller the easier. – Damir Sudarevic Aug 31 '11 at 19:26
  • @Marius Burz; That would violate **FK** `Proposition.Role_1_ID`. You would have to delete in `User` table directly and leave it in `Resource`. – Damir Sudarevic Aug 31 '11 at 20:08
  • @Marius Burz; When creating a new **user**, the app actually creates new `Resource` with `ResourceType = 'usr'` and then uses the new generated `ResourceID` to insert into `User` table. When deleting a **user**, the app deletes from `Resource`, the FK in `User` has `ON DELETE CASCADE`. The app does not handle sub-type tables independently of `Resource`. This is standard super-type/subtype; nothing special here. – Damir Sudarevic Aug 31 '11 at 20:27
  • @Damir: ooops... you're on the same track as Joel (sort of). I've overseen every resource table actually uses as PK a PK from `Resource`. My bad, sorry again. – Marius Burz Aug 31 '11 at 20:35
3

If you rearrange things slightly, you can simplify your comments and compliments. Essentially you want to have a single store of comments and another one of compliments. Your problem is that this won't let you use declarative referential integrity (foreign key constraints).

The way to solve this is to make sure that the objects that can attract comments and compliments are all logical sub-types of one supertype. From a logical perspective, it means you have an "THING_OF_INTEREST" entity (I'm not making a naming convention recommendation here!) and each of the various specific things which attract comments and compliments will be a sub-type of THING_OF_INTEREST. Therefore your comments table will have a "thing_of_interest_id" FK column and similarly for your compliments table. You will still have the sub-type tables, but they will have a 1:1 FK with THING_OF_INTEREST. In other words, THING_OF_INTEREST does the job of giving you a single primary key domain, whereas all of the sub-type tables contain the type-specific attributes. In this way, you can still use declarative referential integrity to enforce your comment and compliment relationships without having to have separate tables for different types of comments and compliments.

From a physical implementation perspective, the most important thing is that your various things of interest all share a common primary key domain. That's what lets your comment table have a single FK value that can be easily joined with whatever that thing of interest happens to be.

Depending on how you go after your comments and recommendations, you probably will (but may not) need to physically implement THING_OF_INTEREST - which will have at least two attributes, the primary key (usually an int) plus a partitioning attribute that tells you which sub-type of thing it is.

Joel Brown
  • 14,123
  • 4
  • 52
  • 64
  • Something like this works fine for very few tables when they are rather related to each other (say `questions` and `answers` with a common table `posts`) but not in this case. This feels like designing the system around `Comment` and `Compliment` by making `User`, `Place` and `Event` all "descend" from a common parent. It might work for now, but what if you get yourself another 10 things of interest down the road? It just gets awkward, even more so than by ending up with a huge amount of tables. – Marius Burz Aug 30 '11 at 20:26
  • If the only sense in which they decsend from a common parent is that the parent supplies the list of unique surrogate primary key values then there is nothing awkward about it at all from a design standpoint and it can only become a physical implementation issue if this "PK well" becomes an insert hotspot. Even if this does happen there are ways around it. I therefore respectfully disagree with your view. – Joel Brown Aug 30 '11 at 22:25
  • I'll have to sleep over it, it's a challenging proposition you've got over there since I never though of using it for so many tables. It has it's merits nonetheless, but I still have to come to terms with it and "discover" why I still feel reserved about it. – Marius Burz Aug 30 '11 at 22:55
  • This design pattern is commonly used as the database implementation of the class inheritance pattern used by application code. With this context, my proposed solution makes no sense because there is no reasonable way to consider many drastically different tables as being subclasses of the same supertype. However, if you set the common use aside, you will see that this database design pattern can also serve as the solution to OP's (self-imposed) problem of finding a trade-off between keeping the number of tables down without compromising on use of DRI. – Joel Brown Aug 31 '11 at 12:20
  • I kinda like this solution except as you mentioned there is no "logical" supertype of all those different entities... this method actually seems similar to @Marius Burz method of using many tables in that for each say, comment you add, you have to make insertions into two tables e.g. comment on a photo would be added to comments, then get the last_id, and insert that into photo_comments. *oh btw, i'm not too concerned with having too many tables since I doubt I will even go past 100 in the worst case.* – Ray Aug 31 '11 at 19:31
  • Ray, I guess I should have drawn a picture. My suggestion is not to have separate comments and photo comments tables. The idea is to have one table which contains your key pool, and then tables for your photos, activities, etc. You only have one comment table and one compliment table. When you insert a photo or an activity, you would be inserting into the key pool and into the photo or activity. When someone comments on a photo there would be only one insert. In this way my suggestion is not really like Marius Burz's at all, but closer to what Damir has recently suggested. – Joel Brown Aug 31 '11 at 20:00
  • 1
    @Joel after sitting on this for a day I must say I could come to terms with it. The only reservation I still have is the fact that there is a central node(`THING_OF_INTEREST`) and when it comes to scaling this is pretty bad. Aside from that, you already have a deserved +1 from me. – Marius Burz Aug 31 '11 at 21:12
  • 1
    @marius-burz I agree that this solution would have to be watched carefully regarding scalability and I'm not suggesting that I would use this design myself, although I have seen it used. I also agree with OP that table sprawl ought not to be an overriding consideration, although in fairness to me, I will point out that OP's question did use the word "problem" in the context of ending up with a lot of tables. As with so many things in system design, there is rarely a right answer, merely a best trade-off based on one's chosen design goals and priorities. – Joel Brown Aug 31 '11 at 21:58
  • Joel I think we had some miscommunication along the lines lol, probably shouldn't have used the word 'problem'... but I do get what you are saying, to use a 'SuperType' similar to Damir's solution. I really had a tough time choosing between this method and @Marius method (If I had the points I would have voted all you guys up :)) but I think in the end I'll go with Marius since the Supertype method seems more a fit for a more generalized type social network situation where I could comment/'Like' almost anything... thanks very much for all the feedback guys :D – Ray Sep 01 '11 at 02:31
2

If you need referential integrity (RI) there is no better way to do it than to use many-to-many junction tables. True, you end up having a lot of tables in the system, but that's the cost you need to pay. It also has some other benefits going this route, for instance you get some sort of partitioning for free: you get the data partitioned by their relation type, each in its own table. This offers RI but it is not 100% safe either, for instance there's nothing to guarantee you that a comment belongs to a photo and to that photo alone, you'd need to enforce this kind of constraints manually should you need them.

On the other hand, going with a generic solution like you already did gets you faster off the ground and it's way easier to extend in the future but there'll be no RI unless you'll code it manually (which is very complex and a lot harder to deal with than the alternative M:M for every relation type).

Just to mention another alternative, similar to your existing implementation, you could use a custom M:M junction table to handle all your relations regardless of their type: object1_type, object1_id, object2_type, object2_id. Simple but no other benefit beside very easy to implement and extend. I'd only recommend it if you don't need RI and you got yourself a lot of tables, all interlinked.

Marius Burz
  • 4,555
  • 2
  • 18
  • 28
  • Adding many to many intersection tables is not solving the OP's problem at all, since m:n RI is not the same as 1:m RI. Unless comments are meant to be shared across multiple instances of photos or - as you point out yourself - across different types of target entities. Furthermore, a system that requires a three table join in every case is not going to be as efficient as a system (as I suggested) that can often get away with two table joins. What you are suggesting does not address the OP's aparent concern for table sprawl, nor his desire for using DRI. – Joel Brown Aug 30 '11 at 22:31
  • @Joel Brown I wasn't really concerned with "table sprawl", it's just that I pointed out that it would increase the amount I already had. I think once the tables are being used and useful, then the number of tables shouldn't be too big a deal unless it's a ridiculously large amount :/ – Ray Aug 31 '11 at 19:27
  • 1
    There's an old saying that sounds like: a database is allowed to contain as many tables as required, given the database can handle this. Actually having multiple tables can be a plus, especially when the amount of data is huge: you could put them on different disks(even memory if it makes sense), create indexes on a per relation type base(in this case) and have one less very central point (which matters). – Marius Burz Aug 31 '11 at 20:03
  • Agreeing with Marius on this. I have watched at least 100 youtube lectures on MySQL scalability, all of them were about optimizing DB's with lots of rows -- somewhere between few and none of them (depending on how you define sharding, etc.) were about optimizing DB's with lots of tables. – HoldOffHunger Sep 10 '18 at 15:17