1

I'm trying to design a database schema to keep records on for sale / want ads. I'd like to be able to query, for instance "what was the asking price of posts with this keyword in the title for the past 6 months?"

But the posts themselves can be updated and modified by the posters, so I'd also like to be able to ask questions like, "What revisions did this specific post go through? How did the asking price change?"

My plan with the schema below was to have separate tables for each field I took from the post, keyed by post id and the time they were last updated. I'd also have one table (the last one) that showed how the post looked at each update.

So if there was a post with id, date, price, title, and description, there would be one entry in the post_id, post_date, post_price, post_title, and description tables, as well as one in the post_state table. If the user updated just the price, new entries would be added to the post_date, post_price and post_states table, and the new row in the post_states table would show the new price, but the original values in the other fields.

Schema follows. The problem I'm getting is that the post_states table cannot be created, because the fields in the tables it references are not unique. I'm new at this, and it may be that I'm doing it completely wrong. Also, I'm using PostgreSQL.

CREATE TABLE IF NOT EXISTS post_id (
    id integer PRIMARY KEY,
    url varchar
);

CREATE TABLE IF NOT EXISTS modify_date (
    mod_date time,
    pid integer REFERENCES post_id,
    UNIQUE (mod_date, pid),
    PRIMARY KEY(mod_date, pid)
);

CREATE TABLE IF NOT EXISTS post_price (
    pid integer,
    mod_date time,
    price money,
    FOREIGN KEY(mod_date, pid) references modify_date
);

CREATE TABLE IF NOT EXISTS post_title (
    pid integer,
    mod_date time,
    title varchar,
    FOREIGN KEY(mod_date, pid) references modify_date
);

CREATE TABLE IF NOT EXISTS post_location (
    pid integer,
    mod_date time,
    location varchar,
    FOREIGN KEY(mod_date, pid) references modify_date
);

CREATE TABLE IF NOT EXISTS post_email (
    pid integer,
    mod_date time,
    email varchar,
    FOREIGN KEY(mod_date, pid) references modify_date
);

CREATE TABLE IF NOT EXISTS post_description (
    pid integer,
    mod_date time,
    description varchar,
    FOREIGN KEY(mod_date, pid) references modify_date
);

CREATE TABLE IF NOT EXISTS post_state (
    pid integer,
    mod_date time,
    title varchar REFERENCES post_title(title),
    description varchar REFERENCES post_description(description),
    price money REFERENCES post_price(price),
    location varchar REFERENCES post_location(price),
    email varchar REFERENCES post_email(email),
    url varchar REFERENCES post_id(url),
    FOREIGN KEY(mod_date, pid) references modify_date
);
tasteslikelemons
  • 397
  • 1
  • 2
  • 8
  • You should *at least* replace all your `mod_date time` by `mod_date timestamp with tome zone,` and you don't need the unique constraint if the same fields have a PK constraint on them. – wildplasser Sep 15 '13 at 13:57
  • 1
    Look into "Slowly Changing Dimensions" and, possibly this question: http://stackoverflow.com/questions/6318317/temporal-database-design-with-a-twist-live-vs-draft-rows – Denis de Bernardy Sep 15 '13 at 14:23

1 Answers1

0

Two quick points.

For revision tracking as long as you have a date tracked and a revision number, I think you would do well to look at table_log, which can be used to log previous versions of the log.

Secondly not having primary keys on most of your tables is asking for trouble down the road. I prefer natural primary keys where applicable but you should at least add a surrogate key if this is not possible.

Chris Travers
  • 25,424
  • 6
  • 65
  • 182