Should I roll up multiple values into a list within a single column or use a separate table?
In SQL Server, I have a table that stores product information. This product table has a child table that stores information specific to each part within the product, including dimensions, tooling, etc. Some of these fields, such as tooling, cell sizes, etc. have multiple values. I was planning to store these types of fields in a child table that references the Product's ID. This table would look something like this for a single product ...
Option 1
Product ID | Part No | Cell No | Cell Size |
---|---|---|---|
1 | 1 | 1 | 1.0625 |
1 | 1 | 2 | 4 |
1 | 1 | 3 | 1.0625 |
1 | 2 | 1 | 1.5 |
1 | 2 | 2 | 2.03125 |
1 | 2 | 3 | 4.75 |
1 | 2 | 4 | 1 |
... where the number of cells per part could be 2-20 and the number of parts per product could be 1-6. Each part within the product has approximately 3-7 fields that are similar to the situation described above.
Every time a product was revised we would simply overwrite the data, increment the product's revision number, and manually add a comment indicating the types of changes that were made. However, we would like to start capturing the full dataset with each revision. So, instead of overwriting a record when changes are made, we will create a new record with a new revision number for a product each time it is revised. This means creating new records for each of the product's parts and a part's dimensions.
If we split out Cell Size dimensional information (and similar data) into their own table, this could end up being a huge number of records that are recreated upon revising the product. So, I considered rolling up the cell sizes (and similar data) into a list, which would be stored in a single column within the part table, like this ...
Option 2
Product ID | Part No | Cell Size |
---|---|---|
1 | 1 | 1.0625, 4, 1.0625 |
1 | 2 | 1.5, 2.03125, 4.75, 1 |
This is generally frowned upon as the data is no longer normalized; however, it would really reduce the number of new records created upon each revision.
The cell sizes (and comparable dimensional fields) rarely (if ever) need to be queried individually; 99% of the time they are only seen as a whole, in which case the list formatting is fine. In an answer from related question Is storing a delimited list in a database column really that bad? this was listed as one of the reasons why multiple values in a single column may make sense.
I wonder if a single varchar() field listing the data ("Cell Size" in Option #2) would take up less space than creating an entirely new table (option #1), even if most of the data types in that table are only tinyint or decimal.
The primary downside I see to the second option is that I would lose the data types of those dimensional fields, which has potential to be an issue. Additionally, if there were a situation in the future that would require those dimensional fields to be seen individually, I'd have to parse those out first; however, I can't foresee many scenarios where that'd be the case.
What are factors that I'm not considering?