0

I have a table of Events and a number of subtables of different types to define each event.

Event table has an ID, a (potentially non-unique) Datetime, userId, and an event typecode.

For the Event Type tables, say ExerciseStart... is EventId an adequate primary key? Each event is only one type, so this should never double up. And is it useful to include the datetime in the subcategories, even perhaps as a compound primary key to avoid accidental doubling up and potentially make query writing easier? (though it would also be redundant I feel).

(Unless I were to use Datetime and event typecode as a compound key, but that seems more potentially risky to me)

Damon
  • 10,493
  • 16
  • 86
  • 144
  • 1
    It would help if you provide use with some specifics about the structure of the tables. On a different note, whenever you use a surrogate key (I'm assuming EventId is auto incrementing value), you should also define another unique constraint on the table consisting of columns other than the PK. E.g., in your Event table, what else uniquely identifies an Event: DateTime, UserId And TypeCode combined? – Thomas Feb 02 '11 at 05:57
  • User/DateTime/Typecode should theoretically be unique I guess. I'm still not used to not just having a synthetic key for everything is that is how I was taught in school and have been doing for a couple years. There's more info about the specific database here: http://stackoverflow.com/questions/4801628/one-table-or-many-for-many-different-but-interacting-events – Damon Feb 02 '11 at 06:14
  • 1
    @Damon, Sadly it seems that some data management courses are utterly failing to teach sensible use of keys in database design. I have come across others like yourself who appear to have learnt that only surrogate keys are worth considering. This is impractical in the real world where natural keys are far more important to getting successful results. – nvogel Feb 02 '11 at 17:13
  • no it's on another tangent a bit. i haven't built anything as of yet, but as i noted in the other thread, i realized that many events will have the same userID and timestamp because they happen simultaneously. including eventcode in the primary key would likely work; theoretically there should be no overlap. But not knowing exactly how information will be entered esp as it expands to support reading other insulin pumps/glucometers, I'm not sure it's enough of a guarantee – Damon Feb 03 '11 at 00:42
  • @Damon. I don't think this is re my Data Model, because it does not have an ID (unique or not). If it is, then uniqiueness is assured in (DateTime, EventTypeCode) due to millisecond resolution. If millisec resolution is not enough, then use an *additional* SequenceNo column. If it isn't, could you please post the DDL you are contemplating. The answer is easy, definitive. Also, you either have SQL xor MyNonSQL (which is not SQL). – PerformanceDBA Feb 03 '11 at 03:34
  • i should have mentioned, a lot of the data i'm getting is being pulled from an external (and somewhat disorganized) csv which groups certain event types under one row and often has events on the same second, which is as granular as it gets. I'm using SQLite locally, but at some point will need to synchronize remotely via mysql – Damon Feb 03 '11 at 04:01
  • @Damon. No problem. There is a way to handle that correctly, will put it in the next version of the DM. `Id` keys will not fix the problem anyway. – PerformanceDBA Feb 04 '11 at 09:28

2 Answers2

0

I think what you need is called class table interitance: you have a envent_table that contains all common properties and many tabele for different types of events.

It could be something like this:

CREATE TABLE events(
    event_id SERIAL PRIMARY KEY,
    ... other fields
);

CREATE TABLE event_typeN(
    event_id BIGINT UNSIGNED PRIMARY KEY,
    ...other fields,
    FOREIGN KEY(event_id) REFERENCES events(event_id)
);
YasirA
  • 9,531
  • 2
  • 40
  • 61
Oleg
  • 2,733
  • 7
  • 39
  • 62
  • yeah i have something like that. My question is more specifically about whether I really need a sperate primary key for my event_typeN table since each event has a separate parent with its own unique key, so why not just use the foreign key.. – Damon Feb 02 '11 at 06:15
  • Tables with different types of events need primary keys (as almost any talbes), so having event_id in typeN as primary key is ok. – Oleg Feb 02 '11 at 06:21
0

Using Standard SQL...

The usual way to do 'inheritence' or 'subclassing' is to have a compound key on (event_type, event_ID) and use this key in the referencing 'subtype' tables with a CHECK constraint to ensure the event_type is appropriate for that 'subtype' e.g. CHECK (event_type = 'Exercise start').

An additional touch is to make the event_type DEFAULT in the 'subtype' table again match the appropriate type e.g. event_type VARCHAR(20) DEFAULT 'Exercise start' NOT NULL.

The structure could look like this:

CREATE TABLE EventTypes 
(
 event_type VARCHAR(20) NOT NULL PRIMARY KEY
);

CREATE TABLE Events 
(
 event_ID CHAR(10) NOT NULL, 
 event_type VARCHAR(20) NOT NULL
    REFERENCES EventTypes (event_type), 
 event_date DATE NOT NULL, 
 PRIMARY KEY (event_type, event_ID)
);

CREATE TABLE ExerciseStartEvents 
(
 event_ID CHAR(10) NOT NULL, 
 event_type VARCHAR(20) DEFAULT 'Exercise start' NOT NULL
    CHECK (event_type = 'Exercise start'), 
 FOREIGN KEY (event_type, event_ID)
    REFERENCES Events (event_type, event_ID), 
 PRIMARY KEY (event_type, event_ID), 
 exercise_description VARCHAR(30) -- etc --
);

However, there is a big problem with mySQL in that it does not enforce CHECK constraints :( [How anyone can tolerate this situation is beyond me!] So to answer your question, having a simple key on event_ID alone is not adequate because you can't enforce the appropriate type in the 'subtype' tables.

While it may be tempting to 'promote' event_ID to be a candidate key in the subtype table, this may not be a good idea. Although it would allow only one row for a given event_ID to appear in the table, if that row is of the wrong type it would prevent the row with the correct type from being inserted!

Solution? Move to a better SQL implementation ;)

onedaywhen
  • 55,269
  • 12
  • 100
  • 138
  • @onedaywhen. Can you explain how the ID key "prevents duplicates" ? – PerformanceDBA Feb 02 '11 at 16:05
  • Ok, so the statement that "ID keys prevent duplicates" is patently false. – PerformanceDBA Feb 02 '11 at 23:16
  • the reason for the 'prevents duplicates' is that in programming with a database there are many situations where you need to return 1 specific record, and with a synthetic auto-incrementing primary key, when you ask for a record by key you will always get one back no matter what. How I understand it is that Id keys prevent a certain kind of duplication that is of most concern to many developers. but not the kind that's of concern to db designers. – Damon Feb 03 '11 at 00:45
  • Like if two people have the same name, give the same SSN and birthday for some bizarre reason..same everything. you can end up with two identical records. when really there should be two distinct records that should happen to have the same information in them. there's workarounds in both scenarious of course. but a synthetic key is a guarantee that it won't happen, and the programmers can worry about what happens if it does – Damon Feb 03 '11 at 00:50
  • @PerformanceDBS: I didn't make the statement, "ID keys prevent duplicates" The point I was trying to make is that, because mySQL does not enforce the `CHECK` constraint in the 'subtype' table, the compound key is no better (and could be considered worse) than a simple key on `event_ID` alone. – onedaywhen Feb 03 '11 at 08:45
  • @PerformanceDBS: "'ID keys prevent duplicates' is patently false" -- I'm not so sure. If there is a business rule that states that `ID` is indeed a key *AND* `ID` has been declared as a key in the DBMS *AND* the DBMS correctly enforces the key then "ID keys prevent duplicates" is kind of correct. Of course, the DBMS cannot know what `ID` represents in reality. – onedaywhen Feb 03 '11 at 08:54
  • @PerformanceDBA: read up on the "Closed World Assumption": "by convention, rows consistent with the constraints are deemed by the DBMS representatives of true propositions that must occur in the database, and rows in violation of the constraints as representative of false propositions that should not occur in the database." – onedaywhen Feb 03 '11 at 09:08
  • @Damon: "you can end up with two identical records. when really there should be two distinct records that should happen to have the same information in them... the programmers can worry about what happens if it does" -- sounds to me like that it would be the users who would suffer in either situation! However, if there were a key over all attributes then "two identical records" should not occur. – onedaywhen Feb 03 '11 at 09:14
  • @PerformanceDBA: "...Databases do not 'reflect the truth consistently,' but rather, the DBMS ensures that they represent what, by convention, we assume to be true: propositions that do not violate the business rules." – onedaywhen Feb 03 '11 at 09:16
  • @onedaywhen. What does "the simple event_ID key is better than the compound because at least it prevents duplicates!" mean, if it doesn't mean that "ID keys prevents duplicates" ? are you running around in circles ? Do we need to quote you verbatim because the meaning is lost otherwise ? Re ID keys, the whole point is that they are meaningless, not keys seen by the user, if they are meaningful, then give them a name. Instead of all this gobbledegook, simply posts an example of how "ID keys prevent duplicate **rows** ". – PerformanceDBA Feb 03 '11 at 13:20
  • @onedaywhen. Thanks, but I take my reading recommendations from people who are clear, who do not run themselves in voluminous circles, saying nothing. – PerformanceDBA Feb 03 '11 at 13:22
  • @PerformanceDBA: my, have you got the wrong end of the stick! Just because I use an data element name ending in ID (and in this case only because the OP did) this does not mean that I advocate using surrogate keys. Quote me anything I've ever written anywhere where I have said I advocate using surrogate keys. If you like, I can cite various sources where I state the opposite i.e. that I am an advocate of natural keys and shun surrogate keys. – onedaywhen Feb 03 '11 at 16:33
  • @PerformanceDBA: http://stackoverflow.com/questions/963809/should-i-use-composite-primary-keys-or-not/964143#964143 | http://stackoverflow.com/questions/3225558/move-child-rows-to-new-parent-fk-id/3226591#3226591 | http://stackoverflow.com/questions/3162202/sql-primary-key-integer-varchar/3165005#3165005 – onedaywhen Feb 04 '11 at 08:15
  • @onedaywhen. Those answers are good; the explanantions above not so good. Ok, you are squarely on my side of the fence, and in the minority (notice how those with high rep and completely incorrect answers get voted up!). I am less religious than you; there is a place for surrogates. So I am left with only a small bone to pick with you: change that last line in your answer "the simple event_ID key ... at least it prevents duplicates!" and I can give you a vote. – PerformanceDBA Feb 04 '11 at 09:27
  • @PerformanceDBA: on reflection, I was too lenient. I've updated my answer accordingly. Don't get me wrong, I do use surrogates occasionally but I do probably have a tenancy to be dogmatic about natural keys ;) – onedaywhen Feb 04 '11 at 10:14
  • @onedaywhen. The surrogates here are Damon's, not mine! For simple and complete implementation of RI for Subtypes, check [this answer](http://stackoverflow.com/questions/4896831/how-to-implement-referential-integrity-in-subtypes/4898314#4898314) – PerformanceDBA Feb 05 '11 at 08:59