1

We have been developing the system at my place of work for sometime now and I feel the database design is getting out of hand somewhat.

For example we have a table widgets (I'm spoofing these somewhat):

+-----------------------+
|        Widget         |
+-----------------------+
| Id | Name     | Price |
| 1  | Sprocket | 100   |
| 2  | Dynamo   | 50    |
+-----------------------+
*There's about 40+ columns on this table already

We want to add on a property for each widget for packaging information. We need to know if it has packaging information, if it doesn't have packaging information or we don't know if it does or doesn't. We then need to also store the type of packaging details (assuming it does or maybe it doesn't and it's reduntant info now).

We already have another table which stores the details information information (I personally think this table should be divided up but that's another issue). PD = PackageDetails

+--------------------------------+
|       System Properties        |
+--------------------------------+
| Id  | Type     | Value          |
| 28  | PD       | Boxed          |
| 29  | PD       | Vacuum Sealed  |
+--------------------------------+
*There's thousands of rows in the table for all system wide table properties

Instinctively I would create a number of mapping tables to capture this information. I have however been instructed to just add another column onto each table to avoid doing a join.

My solution:

Create tables:

+---------------------------------------------------+
|                widgets_packaging                  |
+---------------------------------------------------+
| Id | widget_id | packing_info | packing_detail_id |
| 1  | 27        | PACKAGED     | 2                 |
| 2  | 28        | UNKNOWN      | NULL              |
+---------------------------------------------------+

+--------------------+
|     packaging      |
+--------------------+
| Id |               |
| 1  | Boxed         |
| 2  | Vacuum Sealed |
+--------------------+

If I want to know what packaging a widget has I join through to widgets_packaging and join again to packaging if I want to know the exact details. Therefore no more columns on the widgets table.

I have however been told to ignore this and put the value int for the packing information and another as a foreign key to System Properties table to find the packaging details. Therefore adding another two columns to the table and creating yet more rows in the system properties table to store package details.

+------------------------------------------------------------+
|                           Widget                           |
+------------------------------------------------------------+
|  Id | Name      |Price | has_packaging | packaging_details |
|  1  | Sprocket  |100   | 1             |  28               |
|  2  | Dynamo    |50    | 0             |  29               |
+------------------------------------------------------------+

The reason for this is because it's simpler and doesn't involve a join if you only want to know if the widget has packaging (there are lots of widgets). They are concerned that more joins will slow things down.

Which is the more correctly solution here and are their concerns about speed legitimate? My gut instint is that we can't just keep adding columns onto the widgets table as it is growing and growing with flags for properties at present.

tom808
  • 320
  • 2
  • 11
  • 3
    Your normalized solution is mathematically superior from a design, data consistency and performance standpoint. Do a little research on Database Normalization (https://en.wikipedia.org/wiki/Database_normalization) and 3NF (https://en.wikipedia.org/wiki/Third_normal_form) so that you can combat naysayers. –  Jan 21 '16 at 16:01
  • I added a tag; search for [mysql] [entity-attribute-value] for many discussions of similar tasks. – Rick James Jan 23 '16 at 21:16

1 Answers1

1

The answer to this really depends on whether the application(s) using this database are read or write intensive. If it's read intensive, the de-normalized structure is a better approach because you can make use of indexes. Selects are faster with fewer joins, too.

However, if your application is write intensive, normalization is a better approach (the structure you're suggesting is a more normalized approach). Tables tend to be smaller, which means they have a better chance of fitting into the buffer. Also, normalization tends to lead to less duplication of data, which means updates and inserts only need to be done in one place.

To sum it up:

Write Intensive --> normalization

  • smaller tables have a better chance of fitting into the buffer
  • less duplicated data, which means quicker updates / inserts

Read Intensive --> de-normalization

  • better structure for indexes
  • fewer joins means better performance

If your application is not heavily weighted toward reads over writes, then a more mixed approach would be better.

devlin carnate
  • 8,309
  • 7
  • 48
  • 82
  • @tom808 - that is one answer, and there are several other answers on that page that argue differently. I think the important aspect of the answer you linked is that it pertains to a Data Warehouse, which is not the same animal as a few tables in a mysql database. It's also comparing complete normalization vs complete un-normalization, which again, is not the OP's scenario. – devlin carnate Jan 29 '16 at 15:36