1

Got three entities -

  1. User - Has username/password, contact information, billing information etc.
  2. Periodical - Has periodical_name, category, publisher_info, print_cycle, unit_price etc.
  3. Subscription - Has the user-id, periodical-id, subscription start/end date, status etc.

And following relationships -

  • User can have Subscription for multiple Periodicals.
  • Subscription belongs-to a User, and has-a Periodical associated.
  • Periodical has-many Subscriptions associated with it.

s.t.,

  • User--Subscription relation is One-to-Many,
  • Periodical--Subscription relation is One-to-Many,

My questions --

  1. Is this model description correct for the real-world relationships typically found ?

  2. Or, am I better off, collapsing Periodical into Subscription, especially if per Periodical information is not extensive and can be encoded into, say a delimiter separate text field (e.g. "PeriodicalName:Frequency:Publisher:UnitPrice") ?

  3. Can I say that via associativity User--Periodical relation is Many-to-Many ?

  4. Would appreciate if someone can show how I could put a constraint around either User or Subscription table (assuming that I don't collapse Subscription & Periodical), s.t. when a User needs to be removed for some reason, all associated Subscriptions are deleted too.

  5. Would like to keep a User record around for sometime (say a year), even after all Subscriptions have expired, so I guess I can assign NULL to the FK subscription_id in User table, right ? This is when no corresponding record exists in Subscription table.

bdutta74
  • 2,798
  • 3
  • 31
  • 54

2 Answers2

2
  1. Yes.

  2. You are describing "denormalization" which may under rare circumstances be useful but it's a violation of relational design.

  3. yes.

  4. This type of constraint (1-to-1) is not implemented declaratively in most RDBMS products. Instead we have 1-to-zero-or-one. You can do it with triggers but it's tricky and subtle. Google for Object-Role Modeling which is a more comprehensive modeling technique that addresses stuff like this (and "either-or" and "at least 2" and many others.)

  5. That would be the standard way to handle it. You could also leave the subscription with an expiration date. But the 1-to-zero-or-one deficiency nicely facilitates your suggestion.

dkretz
  • 37,399
  • 13
  • 80
  • 138
1
  1. You model is correct. I would implement this as 3 tables User, Periodical, Subscription.

  2. You should not collapse Periodical into Subscription. If a Periodical or some portion of the periodical is the same for all users it goes in that table. Anything that changes by subscription goes in the subscription table.

  3. Yes.

  4. I think this is a bad idea. You are talking about Business logic here. That should not be implemented in the database. Your requirements might change. You can require that all users (FKs) in the subscription table exist -- but you should not have the data tier remove subscriptions when a user is deleted -- instead it will block and say it can't remove because the FK is being used in the subscription table. (Your logic layer will need to delete all subscriptions before it can delete the user.)

  5. You don't need a FK in the user table for subscriptions. The subscription table contains the FK for both user and periodical -- it is what is called a join table. You would need a join when you want to retrieve both user data from the user table and the subscription information. Often you might not need this (for example you know the user id and you just want to list that user's subscription information).

Hogan
  • 69,564
  • 10
  • 76
  • 117
  • Thanks @Hogan. Re Q5, are you saying that Subscription table would be something that has say subscriptionId as PK, and User(userId) + Periodical(periodicalId) as FKs ? In this case, to find all subscriptions that a User has, I do a 'SELECT * FROM Subscription WHERE userId=' ? If so, when would I need an explicit JOIN ? – bdutta74 Nov 05 '11 at 18:19
  • Re Q4, my idea was to ensure database consistency (from DB normalization standpoint). Isn't that important ? – bdutta74 Nov 05 '11 at 18:22
  • @icarus74 - Updated based on your questions. – Hogan Nov 05 '11 at 18:30
  • You're confusing the "what" (logical model) and the "how" (physical). The logical model is built on the business requirements and determines the physical model. And anything can change. – dkretz Nov 05 '11 at 18:31
  • @ledorfier : are you objecting to my use of "Business logic" instead of "Business rules"? – Hogan Nov 05 '11 at 18:33
  • Business logic or business rules are what essential for deriving the logical model which is what OP is asking about so they are the core of what OP is asking when requesting what constraint to use in their model. Whether it's implemented with unique indexes or keys or checks or triggers is physical design (which OP assumes means cascading deletes.). – dkretz Nov 05 '11 at 18:44
  • @ledorfier - based on this answer http://stackoverflow.com/questions/460316/are-database-triggers-evil/460343#460343 I think you agree with what I'm trying to express -- how should I express it? – Hogan Nov 05 '11 at 18:48
  • @Hogan - post your clarification, I think I understand your answer better. One of my major worries was will the JOINs be a significant bottleneck. But given the nature of queries I plan on running (currently), seems like JOINs should be rare if modeled this way. – bdutta74 Nov 05 '11 at 18:56
  • @ledorfier, thanks for taking time to critique. I did indeed mean cascaded deletes to ensure referential integrity of the database. Is there some lucid & brief explanation of the various approaches s.a. unique indexes, keys, checks (I guess in biz logic), triggers in physical design to ensure that ultimately referential integrity is maintained. I am, for sure confused, but that's largely due to rather rusty & limited knowledge of DB design. – bdutta74 Nov 05 '11 at 18:59
  • Your questions are those we have all gone through on the way to wherever we are. Database optimization is too often worried about because queries have a fundamental flaw compared to other code. If you write a bad query in many cases it doesn't fail, it just runs slowly. So people end up adding complexity because they don't know what to fix or how to fix it. The first rule is don't worry about database optimization, worry about getting the simple stuff right. Most of your question is about things that don't matter. Databases are way fast enough not to worry about it. – dkretz Nov 05 '11 at 19:23
  • @icarus74 - RMDBs are designed to joins -- never worry about having to do a join. Worry about sub-queries, this is where slowness most often creeps in. As I said, it makes sense to use an FK constraint that a user (and periodical) must exist for the join table, but having the DB peform deletes via a trigger is often a mistake. – Hogan Nov 05 '11 at 19:27
  • 1
    Also see http://stackoverflow.com/questions/312003/what-is-the-most-ridiculous-pessimization-youve-seen – dkretz Nov 05 '11 at 19:29
  • @ledorfier, re: 'Databases are way fast enough not to worry about it', is reassuring, but several posts on SO & other places, had led me to believe otherwise. My approach to DB design is driven by the larger aim of developing a website that (I hope) would attract & host several hundred thousand (if not million) subscribers. And in all questions about design choices, framework choices, platform choices, I've been told that there is only so much you can do on that end, since the bottleneck, eventually is going to be your DB. However, I think that I do get the draft :-). – bdutta74 Nov 05 '11 at 19:56
  • 1
    @icarus, you're right, the real questions are more like 1. When is a performance optimization premature; 2. when has a database design reached its performance limit (vs. what queries aren't sufficiently optimized and what configuration variables aren't optimized); 3. Have we done all the normalization that's practical; 4. Have we examined the metrics completely, and know for certain what limits in the hardware and software are constraining performance? When you've answered all those questions then you can think about sacrificing design. Too often people sacrifice design integrity too early. – dkretz Nov 05 '11 at 20:11
  • if possible kindly review [the SQL schema](http://pastebin.com/7eLjmw6S) generated using MySQL Workbench. Currently the FK constraints are NO_ACTION. I wasn't sure how/where to turn them to CASCADED, or should I let it be ? – bdutta74 Nov 05 '11 at 20:17
  • 1
    I think it's fine. There are subtleties that generally start arguments and don't matter, and I think you've made good choices. I think cascades are too much like triggers and should be avoided in general. Philosophically, a subscription exists/existed whether it has expired/been cancelled or not, so there's a risk wiping out all derivative information. I usually prefer to have an "active" column (I notice you have "status") and then have maybe a view on ActiveSubscriptions. Same point for other tables. I only ever delete data for data entry errors. Notice that archiving is a pessimization too. – dkretz Nov 05 '11 at 20:21
  • Thanks for your review and comments. They have really been helpful. – bdutta74 Nov 06 '11 at 04:29