7

We have a shopping cart as pictured below, The setup works well, except for one fatal flaw. If you place an order the order is linked to a product, so If I update the product after you have purchased the product there is no way for me to show you want the product looked like when you bought it (including price). This means we need versioning.

Current Schema

My plan at present is to, when a new product, or variant is created, or an existing one is edited, create a duplicate of the product or variant in the database. When a purchase is made, link the order to the version, not the product.

This seems rather simple, except from what I can see the only things we don't need to version are the categories (as no one cares what categories it was in.). So we need to version:

  • Products
  • Variants
  • The key -> value pairs of attributes for each version
  • The images

My current thinking is,

note: When a product is created a default variant is created as well, this cannot be removed.

  • When a product is created
    • Insert the product into the products table.
    • Create the default variant
    • Duplicate the product into the products_versions table
      • Replace current id column with a product_id column
      • Add id column
    • Duplicate the variant into the variants_versions table
      • Replace current id column with variant_id column
      • Add id column
      • Replace product_id column with product_version_id column

  • When a product is edited
    • Update the product into the products table.
    • Duplicate the product into the products_versions table
      • Replace current id column with a product_id column
      • Add id column
    • Duplicate all product variants into the variants_versions table
      • Replace current id column with variant_id column
      • Add id column
      • Replace product_id column with product_version_id column
    • Duplicate all variant_image_links into the variant_Image_link_version table
      • Replace current variant_id column with variant_version_id column

  • When a variant is added
    • Add the variant into the variants table.
    • Duplicate the product into the products_versions table
      • Replace current id column with a product_id column
      • Add id column
    • Duplicate all product variants into the variants_versions table
      • Replace current id column with variant_id column
      • Add id column
      • Replace product_id column with product_version_id column

  • When a variant is edited
    • Update the variant in the variants table.
    • Duplicate the product into the products_versions table
      • Replace current id column with a product_id column
      • Add id column
    • Duplicate all product variants into the variants_versions table
      • Replace current id column with variant_id column
      • Add id column
      • Replace product_id column with product_version_id column
    • Duplicate all variant_image_links into the variant_Image_link_version table
      • Replace current variant_id column with variant_version_id column

So the final structure looks like Full Size

Now this all seems great, except it seems like a heck of a lot of duplicated data, e.g. if we update a product we duplicate the variants even though they would not have been updated since they were inserted. Also, this seems like a lot of work.

Is there a better way of doing this?

Hailwood
  • 89,623
  • 107
  • 270
  • 423
  • What program did you use to draw those diagrams ? I like how it looks. – Radu Murzea Nov 13 '12 at 13:03
  • 1
    @SoboLAN http://dbdsgnr.appspot.com/ – Hailwood Nov 13 '12 at 13:06
  • +1 for the dbdsgnr.appspot.com info and +1 for the question. I'm surprised that this hasn't been covered in a "standard database models" book. If it has, I'd like to know which one. Can't recall the other book which is a massive collection of known/required models for different industry fields. – aneroid Nov 16 '12 at 05:09
  • This the book I was referring to: [The Data Model Resource Book - Len Silverston](http://books.google.com.my/books?id=XkBPl4Ted40C). But it doesn't cover **generic data versioning**. – aneroid Nov 16 '12 at 05:20

2 Answers2

5

You can do what ERP (and also possibly Payroll) systems do: Add a Start and End Date/Time. So...

  • the variant and prices match with their product based on the common dates.
  • all queries default to running on current date and the joins between each table need to also take into account the overlapping/intersecting date ranges. parent_start_date <= child_start_date AND parent_end_date >= child_end_date
  • You would end up with duplicated rows for each price change or variant but you then don't need to keep update as many records (like variant ids) when the product price changes.
  • Need to ensure valid dates are used. PS: Use your system's max date for the End datetime of the most current/recent record.

Btw, some related questions along the same line:

Community
  • 1
  • 1
aneroid
  • 12,983
  • 3
  • 36
  • 66
  • This sounds like a good idea, so lets see if I can break this down in the comments below, let me know if anything seems wrong. – Hailwood Nov 15 '12 at 20:01
  • **New Product Created** Add a product into the products table, Fill in a start date, leave end date null, the same with the default variant in the variants table. – Hailwood Nov 15 '12 at 20:02
  • **Product Edited** `UPDATE products SET end_date = NOW() WHERE id={id} AND end_date IS NULL` the insert the new product, leaving the end_date null. – Hailwood Nov 15 '12 at 20:04
  • **Variant Edited** `UPDATE products SET end_date = NOW() WHERE id={id} AND end_date IS NULL` `UPDATE variants SET end_date = NOW() WHERE id={id} AND end_date IS NULL` Insert edited variant as new row, setting the start date and leaving the end_date null, insert duplicate product, setting start_date – Hailwood Nov 15 '12 at 20:15
  • **Variant Added** `UPDATE products SET end_date = NOW() WHERE id={id} AND end_date IS NULL` Insert new variant as new row. Insert duplicate product setting start_date – Hailwood Nov 15 '12 at 20:16
  • **Image Added** Run ***Variant Edited*** Routine. Insert new link into variant_image_link setting start_date, leaving the end_date null – Hailwood Nov 15 '12 at 20:18
  • **Selecting latest version of product** `SELECT * FROM products WHERE product_id = {id} AND end_date IS NULL` – Hailwood Nov 15 '12 at 20:21
  • **Selecting variants for any version of product** `SELECT * FROM variants WHERE v.product_id = {id} AND variants.start_date <= {product.start_date} AND variants.end_date >= {parent.end_date}` I am not too sure about this one, I have switched the date check from what you have above, as don't we want to select variants that start before the product, and end after? – Hailwood Nov 15 '12 at 20:30
  • **Getting a specific version of a product based on date** .e.g when an order is placed, we store the order_date, so we can do `SELECT * FROM products WHERE product_id = {id} AND start_date <= {order_date} AND end_date IS >= {order_date}` But is this going to work if the end_date is null (I.E product has not been updated since that purchase?) – Hailwood Nov 15 '12 at 20:32
  • For **Selecting variants for any version of product**: A variant of a product shouldn't exist before the product ;-) Also, remember that the product will extend over a large time period so `variants.start_date <= {product.start_date}` (and the end date check) will always end up selecting the current variant of the _previous_ version/dates of the product which is still active (or something). Add `OR end_date IS NULL` to the end date check, within brackets. This is a case where the start/end date system becomes more complicated than an ID system. May be better once the queries are sorted out. – aneroid Nov 16 '12 at 04:40
  • For **Getting a specific version of a product based on date**: As above, add the end_date `NULL` check: `SELECT * FROM products WHERE product_id = {id} AND start_date <= {order_date} AND (end_date >= {order_date} OR end_date IS NULL)`. That should select the correct version of the product. (btw, the `IS` with `>=` is incorrect afaik). Also, `end_date IS NULL` check is why I recommended "system's max date" so you don't have to have that and reduce potential for query errors (may also help efficiency with indexes/references). In SAP, that's `High-Date = 31.12.9999`, or 99991231 actually. – aneroid Nov 16 '12 at 04:48
  • Another thing, where possible, I would say add the date modification stuff as triggers. Because you also need to take into account that if an insert/update fails, you need to revert the date values. Also, do you need a delete? A delete would be, delete record where end_date = NULL, then select `max` end_date and update to NULL. (I'll add your comments as your comments in my answer so that readers can see the intended usage.) – aneroid Nov 16 '12 at 04:53
0

Another approach to this would be to never edit or remove your data, only create new data. In SQL terms, the only operations you ever run on your tables are INSERTs and SELECTs.

To accomplish what you want, each table would need the following colums:

  • version_id - this would be your primary key
  • id - this would be the thing that holds versions of your object together (e.g. to find all versions of a product, SELECT * FROM products WHERE id = ?)
  • creation_date
  • is_active - you're not deleting anything, so you need to flag to (logically) get rid of data

With this, here's what your products table would look like:

CREATE TABLE products (
  version_id CHAR(8) NOT NULL PRIMARY KEY,
  id INTEGER NOT NULL,
  creation_date TIMESTAMP NOT NULL DEFAULT NOW(),
  is_active BOOLEAN DEFAULT true,
  name VARCHAR(1024) NOT NULL,
  price INTEGER NOT NULL
);

CREATE TABLE variants (
  version_id CHAR(8) NOT NULL PRIMARY KEY,
  id INTEGER NOT NULL,
  creation_date TIMESTAMP NOT NULL DEFAULT NOW(),
  is_active BOOLEAN DEFAULT true,
  product_version_id CHAR(8) NOT NULL,
  price INTEGER NOT NULL,
  override_price INTEGER NOT NULL,
  FOREIGN KEY (product_version_id) REFERENCES products(version_id)
);

Now, to insert into either table

  1. Generate a unique version_id (there are several strategies for this, one is to use a database sequence, or for MySQL use ant AUTO_INCREMENT).
  2. Generate an id. This id is consistent for all versions of a product.

To update a row in a table, one must insert the entire graph e.g. to update a product, one must insert a new product, and new variants. (There is a lot of room for optimization here, but it's easiest to start with the un-optimized solution.)

For example, to update a product

  1. Generate a unique version_id
  2. Use the same id
  3. Insert new product variants. The variants will be the same as the ones linked to the previous version of the product that you're "updating", except the product_version_id will be different.

This principal can extend to all your tables.

To find the most recent version of a product, you need to use the creation_date column to get the product that was most recently created.

This model will use more space, but I think this may be a fair trade-off given it's simplicity: there are only INSERTs and SELECTs and data is never mutated.

three-cups
  • 4,375
  • 3
  • 31
  • 41
  • Isn't that similar to the answer by drew-pierce? Also, in your example **data still gets mutated** because you still have to update the `is_active` field. To be honest, I don't think there's anything wrong with the approach and sometimes a "`creation_date`" with an `is_active` flag + `version_number` is easier to trace. – aneroid Nov 16 '12 at 05:00
  • Yes, it is similar, but not the same. My solution has no updates. Data is not mutated. Why do you think the is_active field would need to be mutated? – three-cups Nov 16 '12 at 19:01
  • To be clear, the is_active field is meant to allow a product to be logically deleted from the catalog. One could add another field is_this_the_most_recent_version. That field would be mutated. Maybe this is where the confusion lies? – three-cups Nov 16 '12 at 19:02
  • Yes was confused about that part. But then in your model, the `is_this_the_most_recent_version` shouldn't be required - either max id or creation date should be enough. – aneroid Nov 17 '12 at 07:19
  • Correct, is_this_the_most_recent_version is an unnecessary optimization. – three-cups Nov 18 '12 at 00:51