2

I am new to using databases and am trying to design a new database where I think I need a one-to-one relationship spread across many tables.

To demonstrate my design, let's say I am building a schedule database as an example. I start by creating a table for a person with a one-to-many relationship

CREATE TABLE person (
    person_id SERIAL NOT NULL, 
    name VARCHAR,
    PRIMARY KEY (person_id)
);

Next, I create a table of events which contains the many portion of the person relationship

CREATE TABLE events (
    event_id SERIAL NOT NULL, 
    type VARCHAR,
    name VARCHAR,
    person_id INTEGER,
    time TIMESTAMP WITHOUT TIME ZONE,
    PRIMARY KEY (event_id),
    FOREIGN KEY(person_id) REFERENCES person (person_id)
);

Now lets say I have two different types of events that have different information about them, say meal and homework

CREATE TABLE meals (
    event_id INTEGER NOT NULL, 
    food VARCHAR,
    utensils VARCHAR,
    PRIMARY KEY (event_id),
    FOREIGN KEY(event_id) REFERENCES events (event_id)
);

CREATE TABLE homework (
    event_id INTEGER NOT NULL, 
    subject VARCHAR,
    completed BOOLEAN,
    score FLOAT,
    PRIMARY KEY (event_id),
    FOREIGN KEY(event_id) REFERENCES events (event_id)
);

Now, the reason I'm trying to design my database this way is because sometimes, you may just want to display a basic list of events for each person, regardless of what that event is. For instance, if I initialize my tables as follows

INSERT INTO person (name) VALUES ('Brad');
INSERT INTO events (type, name, person_id, time) VALUES ('meal', 'lunch', 1, '12/28/2016 12:00:00')    
INSERT INTO events (type, name, person_id, time) VALUES ('meal', 'breakfast', 1, '12/28/2016 12:00:00');
INSERT INTO meals (event_id, food, utensils) VALUES (1, 'eggs', 'fork');
INSERT INTO meals (event_id, food, utensils) VALUES (2, 'turkey sandwich', 'hands');
INSERT INTO events (type, name, person_id, time) VALUES ('homework', 'final project', 1, '12/28/2016 18:00:00');
INSERT INTO homework (event_id, subject, completed, score) VALUES (3, 'Math', 'T', 0.93);

Then I may want to generate a list of all events for Brad

SELECT (events.time, events.type, events.name) FROM events 
LEFT JOIN person ON person.person_id = events.person_id 
WHERE person.name = 'Brad';

That is all easy, where I'm confused is, what if I want to see what Brad has eaten. I think I could probably use two JOIN statements between person and events and events and meals, but what if I just want to walk through Brads events and get all of the extra information about each event, (for instance, if the event is a meal, tell me what he ate, if it is homework, tell me the score he got)?

Overall I have a couple questions.

  1. Is this a good database design or is there something else I should consider? Each of the potential use cases above, plus a couple more are all standard things I will need to use the database for.
  2. How can I easily determine which table to look in for more information for any given event in the events table? A couple of thoughts here -- I could store the name of the other table that contains more information about the event in the events table (ie, replace the type column with a table column) but I think I read somewhere that was a bad idea.

A couple other notes, I am using Postgresql for the database. The actual database I'm building has a lot more detailed information for each table outside of what I've shown here. I was just trying to give an idea of what I'm trying to get to. Finally, I am building/accessing the database using sqlalchemy's ORM, so if there's a nifty trick I can do using relationships that would help with this that would be really useful to know as well.

Andrew
  • 693
  • 6
  • 19
  • Good question with a clear explanation. From my understanding i don't think your database design is that bad,i'd like to see what others think though. – Lewis Browne Aug 21 '17 at 14:16

2 Answers2

1

If you want to get all the details of each event, you're going to have a problem because the tables that hold the event details have different types of columns. And you certainly don't want to be hardcoding the various event details table names in your code, after all what happens when you want to add or remove a table, or change a name? You have to update your code everywhere!

So first of all I'd say you'll want a view here. Something like:

CREATE OR REPLACE VIEW event_details AS
    SELECT * FROM meals
    UNION ALL
    SELECT * FROM homework;

This would allow you to select the details of all the event types in a single go, e.g.

SELECT * FROM event_details WHERE event_id IN (
    SELECT event_id FROM events WHERE person_id = (
        SELECT person_id
        FROM person
        WHERE name = 'Brad'
    )
)

Except it doesn't work of course, because the table structures are different. So you'll need to find some way to represent the data in a uniform way; for example, doing a ROW_TO_JSON on each record:

CREATE OR REPLACE VIEW event_details AS
    SELECT ROW_TO_JSON(meals.*) AS details FROM meals
    UNION ALL
    SELECT ROW_TO_JSON(homework.*) AS details FROM homework;

And now this query:

SELECT * FROM event_details WHERE (details->>'event_id')::INTEGER IN (
    SELECT event_id FROM events WHERE person_id = (
        SELECT person_id
        FROM person
        WHERE name = 'Brad'
    )
)

Gives you:

{"event_id":1,"food":"eggs","utensils":"fork"}
{"event_id":2,"food":"turkey sandwich","utensils":"hands"}
{"event_id":3,"subject":"Math","completed":true,"score":0.93}

And you can then parse the JSON and do what you want with it. And when you want to add or remove or rename a table, you do it in the view only.

Now mind you, I'm not saying this is a great (or the only) way to do it. It's not clear to me there's a good reason to have a separate table per event type, rather than just having one events table and putting the type-specific data in a JSONB field. It would make querying much easier and faster, and if you use JSONB the type-specific data could be indexed as well. I think that would be a better design, based on the examples you've shown.

404
  • 8,022
  • 2
  • 27
  • 47
  • Thanks for the detailed reply. I think I could see the benefit of having a single events table where all of the common columns are standard columns, and then the remaining data is put into a JSONB field. It would be very easy to quickly just get a list of events and then the application can decide whether it needs the extra data or not. Are there any drawbacks to this architecture? – Andrew Aug 21 '17 at 16:06
  • I can't really comment on the drawbacks as I don't have much experience in this area. I can tell you that storing blobs of data in a way where it's difficult or slow to query definitely sucks, but JSONB gets around both issues. Generally I will always prefer a flat table structure over storing a blob of JSONB data, and there's certainly the option of having more "detail" fields in your single events table and only reading the data from the fields relevant to the type. But if you have most of the data you query as normal fields, then store a little data as JSONB, I would be happy with that. – 404 Aug 21 '17 at 16:34
  • Ok, something to think about. I'll leave the question unanswered for the rest of the day to see if there are any other thoughts/solutions out there and then accept an answer tomorrow. – Andrew Aug 21 '17 at 16:40
0

So @eurotrash's answer answered the questions I asked beautifully, so I've accepted his as the correct answer, but based on his answer I've come up with what I think is a much better design for this database that I wanted to share in case someone else had a similar question. Essentially, we will remove the events table and instead make a materialized view to represent the events information. First, we need to modify how the meals and homework tables are set up and include the information that was formerly in the events table

CREATE TABLE meals (
    meal_id SERIAL NOT NULL, 
    name VARCHAR,
    person_id INTEGER,
    time TIMESTAMP WITHOUT TIME ZONE,
    food VARCHAR,
    utensils VARCHAR,
    PRIMARY KEY (meals_id),
    FOREIGN KEY(person_id) REFERENCES person (person_id)
);

CREATE TABLE homework (
    homework_id SERIAL NOT NULL, 
    name VARCHAR,
    person_id INTEGER,
    time TIMESTAMP WITHOUT TIME ZONE,
    subject VARCHAR,
    completed BOOLEAN,
    score FLOAT,
    PRIMARY KEY (homework_id),
    FOREIGN KEY(person_id) REFERENCES person (person_id)
);

Now, we can initialize our database using:

INSERT INTO person (name) VALUES ('Brad');
INSERT INTO meals (name, person_id, time, food, utensils) VALUES ('breakfast', 1, '12/28/2016 6:00:00', 'eggs', 'fork');
INSERT INTO meals (name, person_id, time, food, utensils) VALUES ('lunch', 1, '12/28/2016 12:00:00', 'turkey sandwich', 'hands');
INSERT INTO homework (name, person_id, time, subject, completed, score) VALUES ('final project', 1, '12/28/2016 18:00:00', 'Math', 'T', 0.93);

And then create a new material view of the common information using

CREATE MATERIALIZED VIEW events AS 
SELECT meal_id as id, 'meals' as table, name, person_id, time FROM meals
UNION ALL
SELECT homework_id as id, 'homework' as table, name, person_id, time from homework;

which gives

 id |  table   |     name      | person_id |        time
----+----------+---------------+-----------+---------------------
  1 | meals    | breakfast     |         1 | 2016-12-28 06:00:00
  2 | meals    | lunch         |         1 | 2016-12-28 12:00:00
  1 | homework | final project |         1 | 2016-12-28 18:00:00

Finally, to ensure that the events view is always up to date we can create triggers to update the view whenever meals or homework are changed according to https://stackoverflow.com/a/23963969/3431189

CREATE OR REPLACE FUNCTION refresh_events_view()
RETURNS TRIGGER LANGUAGE plpgsql AS $$
BEGIN
    REFRESH MATERIALIZED VIEW events;
    RETURN null;
end $$;

CREATE TRIGGER refresh_events_view
AFTER INSERT or UPDATE or DELETE or TRUNCATE
ON meals FOR EACH STATEMENT
EXECUTE PROCEDURE refresh_events_view();

CREATE TRIGGER refresh_events_view
AFTER INSERT or UPDATE or DELETE or TRUNCATE
ON homework FOR EACH STATEMENT
EXECUTE PROCEDURE refresh_events_view();

This gives us the best of both worlds (at least in my opinion) as each specific field for meals and homework still exists, and we still get an always up-to-date events "table" that we can use to quickly query if we just want the basic information about each event (ie name, time, etc).

Andrew
  • 693
  • 6
  • 19