3

First off, I'm not sure how exactly to search for this, so if it's a duplicate please excuse me. And I'm not even sure if it'd be better suited to one of the other StackExchange sites; if so, please let me know and I'll ask over there instead. Anyways...

Quick Overview of the Project

I'm working on a hobby project -- a writer's notebook of sorts -- to practice programming and database design. The basic structure is fairly simple: the user can create notebooks, and under each notebook they can create projects associated with that notebook. Maybe the notebook is for a series of short stories, and each project is for an individual story.

They can then add items (scenes, characters, etc.) to either a specific project within the notebook, or to the notebook itself so that it's not associated with a particular project. This way, they can have scenes or locations that span multiple projects, as well as having some that are specific to a particular project.

The Problem

I'm trying to keep a good amount of the logic within the database -- especially within the table structure and constraints if at all possible. The basic structure I have for a lot of the items is basically like this (I'm using MySql, but this is a pretty generic problem -- just mentioning it for the syntax):

CREATE TABLE SCENES(
    ID        BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY NOT NULL,
    NOTEBOOK  BIGINT UNSIGNED NULL,
    PROJECT   BIGINT UNSIGNED NULL,
    ....
);

The problem is that I need to ensure that at least one of the two references, NOTEBOOK and/or PROJECT, are set. They don't have to both be set -- PROJECT has a reference to the NOTEBOOK it's in. I know I could just have a generic "Parent Id" field, but I don't believe it'd be possible to have a foreign key to two tables, right? There's also the possibility of adding additional cross-reference tables -- i.e. SCENES_X_NOTEBOOKS and SCENES_X_PROJECTS -- but that'd get out of hand pretty quickly, since I'd have to add similar tables for each of the different item types I'm working with. That would also introduce the problem of ensuring each item has an entry in the cross reference tables.

It'd be easy to put this kind of logic in a stored procedure or the application logic, but I'd really like to keep it in a constraint of some kind if at all possible, just to eliminate any possibility that the logic got bypassed some how.

Any thoughts? I'm interested in pretty much anything -- even if it involves a redesign of the tables or something.

MCory
  • 437
  • 2
  • 13
  • 1
    You're right -- I'm not 100% sure what the data model should be. And while I do agree that changing code is easier than changing database structure, this is probably more of an academic exercise -- how much of the requirements can I enforce without using app code, triggers, or stored procs? I guess the reasoning behind it would be that someone can always bypass a stored proc, or a Java class, and access the tables directly; it's a lot harder to bypass a UNIQUE or FOREIGN KEY constraint. – MCory Oct 06 '12 at 21:52
  • This isn't an answer, but it might help. I recommend reading Database in Depth by C.J. Date. The author is obsessed with enforcing data integrity in the database. – Jason Swett Oct 07 '12 at 01:52

2 Answers2

3

The thing about scenes and characters is that a writer might drop them from their current project. When that happens, you don't want to lose the scenes and characters, because the writer might decide to use them years later.

I think the simplest approach is to redefine this:

They can then add items (scenes, characters, etc.) to either a specific project within the notebook, or to the notebook itself so that it's not associated with a particular project.

Instead of that, think about saying this.

They can then add items (scenes, characters, etc.) to either a user-defined project within the notebook, or to the system-defined project named "Unassigned". The project "Unassigned" is for things not currently assigned to a user-defined project.

If you do that, then scenes and characters will always be assigned to a project--either to a user-defined project, or to the system-defined project named "Unassigned".

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
2

I'm unclear as to what exactly are you requirements, but let me at least try to answer some of your individual questions...

The problem is that I need to ensure that at least one of the two references, NOTEBOOK and/or PROJECT, are set.

CHECK (NOTEBOOK IS NOT NULL OR PROJECT IS NOT NULL)

I don't believe it'd be possible to have a foreign key to two tables, right?

Theoretically, you can reference two tables from the same field, but this would mean both of these tables would have to contain the matching row. This is probably not what you want.

You are on the right track here - let the NOTEBOOK be the child endpoint of a FK towards one table and the PROJECT towards the other. A NULL foreign key will not be enforced, so you don't have to set both of them.

There's also the possibility of adding additional cross-reference tables -- i.e. SCENES_X_NOTEBOOKS and SCENES_X_PROJECTS -- but that'd get out of hand pretty quickly, since I'd have to add similar tables for each of the different item types I'm working with.

If you are talking about junction (aka. link) tables that model many-to-many relationships, then yes - you'd have to add them for each pair of tables engaged in such a relationship.

You could, however, minimize the number of such table pairs by using inheritance (aka. category, subclassing, subtype, generalization hierarchy...). Imagine you have a set of M tables that have to be connected to a second set of N tables. Normally, you'd have create M*N junction tables. But if you inherit all tables in the first set from a common parent table, and do the same for the second set, you can now connect them all through just one junction table between these two parent tables.

The full discussion on inheritance is beyond the scope here, but you might want to look at "ERwin Methods Guide", chapter "Subtype Relationships", as well as this post.

It'd be easy to put this kind of logic in a stored procedure or the application logic, but I'd really like to keep it in a constraint of some kind if at all possible, just to eliminate any possibility that the logic got bypassed some how.

Your instincts are correct - make database "defend" itself from the bad data as much as possible. Here is the order of preference for ensuring the correctness of your data:

  • The structure of tables themselves.
  • The declarative database constraints (integrity of domain, integrity of key and referential integrity).
  • Triggers and stored procedures.
  • Middle tier.
  • Client.

For example, if you can ensure a certain logic must be followed just by using declarative database constraints, don't put it in triggers.

Community
  • 1
  • 1
Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • Thank you! I'd completely forgotten about CHECK constraints -- that would definitely help. Edit: I'll gladly accept this answer, but I do want to see if there's any other ideas out there. As it stands right now, I think this is pretty much the right idea; I just want to give it a little bit to see if there's other possibilities. (I'd meant to just add a new line; didn't think it'd submit) – MCory Oct 07 '12 at 01:06
  • MySQL doesn't enforce CHECK() constraints. – Mike Sherrill 'Cat Recall' Oct 07 '12 at 01:15
  • @MCory I completely missed that you are using MySQL - Catcall is correct about CHECK not being enforced there. Please use tags to indicate "target" DBMS more clearly to distracted people like me ;) – Branko Dimitrijevic Oct 07 '12 at 02:08
  • Well, like I said -- this was mostly an academic issue, and I only mentioned that it was MySql in case someone wanted to give me problems about using AUTO_INCREMENT. And as I progressed with the project, it actually became moot anyways -- the main difference between "Projects" and "Notebooks" ended up being semantics. By just having nested Notebooks, I've eliminated the issue altogether. BUT! Your answer was exactly what I was hoping for -- RDBMS support or not! Thanks Branko! – MCory Oct 10 '12 at 00:24