0

Should I roll up multiple values into a list within a single column or use a separate table?

In SQL Server, I have a table that stores product information. This product table has a child table that stores information specific to each part within the product, including dimensions, tooling, etc. Some of these fields, such as tooling, cell sizes, etc. have multiple values. I was planning to store these types of fields in a child table that references the Product's ID. This table would look something like this for a single product ...

Option 1

Product ID Part No Cell No Cell Size
1 1 1 1.0625
1 1 2 4
1 1 3 1.0625
1 2 1 1.5
1 2 2 2.03125
1 2 3 4.75
1 2 4 1

... where the number of cells per part could be 2-20 and the number of parts per product could be 1-6. Each part within the product has approximately 3-7 fields that are similar to the situation described above.

Every time a product was revised we would simply overwrite the data, increment the product's revision number, and manually add a comment indicating the types of changes that were made. However, we would like to start capturing the full dataset with each revision. So, instead of overwriting a record when changes are made, we will create a new record with a new revision number for a product each time it is revised. This means creating new records for each of the product's parts and a part's dimensions.

If we split out Cell Size dimensional information (and similar data) into their own table, this could end up being a huge number of records that are recreated upon revising the product. So, I considered rolling up the cell sizes (and similar data) into a list, which would be stored in a single column within the part table, like this ...

Option 2

Product ID Part No Cell Size
1 1 1.0625, 4, 1.0625
1 2 1.5, 2.03125, 4.75, 1

This is generally frowned upon as the data is no longer normalized; however, it would really reduce the number of new records created upon each revision.

The cell sizes (and comparable dimensional fields) rarely (if ever) need to be queried individually; 99% of the time they are only seen as a whole, in which case the list formatting is fine. In an answer from related question Is storing a delimited list in a database column really that bad? this was listed as one of the reasons why multiple values in a single column may make sense.

I wonder if a single varchar() field listing the data ("Cell Size" in Option #2) would take up less space than creating an entirely new table (option #1), even if most of the data types in that table are only tinyint or decimal.

The primary downside I see to the second option is that I would lose the data types of those dimensional fields, which has potential to be an issue. Additionally, if there were a situation in the future that would require those dimensional fields to be seen individually, I'd have to parse those out first; however, I can't foresee many scenarios where that'd be the case.

What are factors that I'm not considering?

philipxy
  • 14,867
  • 6
  • 39
  • 83
Leah
  • 77
  • 5
  • The first option is better, because it is separated, because its management is better and it is cleaner, and it is easier to register. It is only difficult to output it, and you can create a view. – abolfazl sadeghi May 08 '23 at 15:17
  • 1
    Out of curiosity, why do you feel a need to reduce the number of DB records? You wrote _"it would really reduce the number of new records created upon each revision"_ - but I don't see why that's somehow something to be avoided... – Dai May 08 '23 at 15:18
  • Also, why are you doing data revision tracking manually in your DB? Is there a reason you're not using Temporal Tables? – Dai May 08 '23 at 15:18
  • 3
    Yes: read Codd's seminal paper on RDBMS - then **embrace** having "lots of records" because that's (honestly) probably a good thing: it means your data-model is likely accurately representing the complexity of the real-world (if you want/need a simpler user-visible representastion of the data then create a `VIEW` - that way you won't compromise your actual stored data representation) - also, stop using legacy things like change-data-capture and start using Temporal Tables. – Dai May 08 '23 at 15:42
  • @Dai Regarding reducing the number of rows, this was purely a consideration for space/storage purposes. We have appr. 600-700 new products created monthly + probably a third of those will end up being revised 1 or more times. – Leah May 08 '23 at 15:44
  • But storage space is _so_ cheap and local storage bandwidth is _insane_ now compared to 5-6 years ago: I can buy a 1TB NVMe SSD for like $40: this isn't the 1990s anymore - anyway an RDBMS holds textual data - and it sounds like you're storing maybe a couple KiB per Product? I don't see how any of this could cause any kind of problem unless your DBA is actively trying to get fired (you've got indexes, query-store, auto-tuning, XEvents, etc) - having a high-complexity RDBMS model in no way shape or form implies a trade-off with performance. – Dai May 08 '23 at 15:52
  • In option 1, you don't need to store all the `CellSize` rows again. You can just store the changed ones if any. This would be separate from the tracking of the main table. – Charlieface May 08 '23 at 15:58
  • @Dai Thanks for all the advice! I had seen Temporal Tables pop-up in some of my research and will look into using those per your recommendation. I also appreciate the point on storage space being cheap. This had been my perspective on things; however, I started to get intimidated by how quickly this could grow. So, in summary, is it fair to say that I ought to proceed with Option 1 and use Temporal Tables for tracking historical data? – Leah May 08 '23 at 15:59
  • 1
    I'm still trying to understand what problem you think you're solving here though - – Dai May 08 '23 at 16:03
  • I re-read your post and I take back my remarks about EAV - I jumped to that conclusion based on a quick glance of your table example but I didn't actually check the details. Provided that this "Cell Size" column is an actual real-world attribute of a product and other real-world attributes are stored in distinct columns - including in other additional dimension tables if there's a multiplicity difference) - so If Option A means to continue using a normalized database (up-to 5th Normal Form, I trust? (6NF and beyond still isn't feasible today IMO) then that's what I'd recommend. – Dai May 08 '23 at 16:06
  • @Charlieface Interesting point. I was originally thinking that each child table would always match the parent table's rev number; however, this consideration is pretty much irrelevant with the use of temporal tables since those child tables would be updated individually on an as-needed basis and the use of temporal tables would ensure the most current data is used. Thanks! – Leah May 08 '23 at 16:10
  • Codd's paper is here btw: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf (yes, it's from 1970) – Dai May 08 '23 at 16:10
  • @Leah Did I forget to say that Temporal Tables are old-hat now? Now we're all using _Ledger Tables_, which extend Temporal Tables with THE BLOCKCHAIN (ohgod) and the inability to delete anything! – Dai May 08 '23 at 16:11
  • [How can you represent inheritance in a database?](https://stackoverflow.com/q/3579079/3404097) [More](https://stackoverflow.com/q/190296/3404097) [How to design a product table for many kinds of product where each product has many parameters](https://stackoverflow.com/q/695752/3404097) [More](https://stackoverflow.com/a/2945124/3404097) [And more](https://stackoverflow.com/q/5106335/3404097) [Re EAV](https://stackoverflow.com/a/23950836/3404097) etc etc etc [Re SQL "performance".](https://stackoverflow.com/a/24196511/3404097) etc – philipxy May 08 '23 at 17:26
  • Please clarify via edits, not comments. Please ask 1 specific researched non-duplicate question. "Any recommendations" is not an on-topic question. Please ask a new (single specific researched non-duplicate) question in a new post, not in comments. [Help] [meta] [meta.se] – philipxy May 08 '23 at 18:06
  • 1
    I'm confused about what you are actually asking. Do you want to know whether to store delimited data in SQL (usually bad)? Do you want to know how to keep tha data types if you do that (use JSON/XML maybe)? Do you want to know how to do temporal tables? Please stick to one question, and cut down your post to a readable length. – Charlieface May 08 '23 at 19:19
  • Please don't insert "EDIT"s/"UPDATE"s, just make your post the best presentation as of edit time. Please see my edit. Please act on the feedback. Please reorganize to "Please ask 1 specific researched non-duplicate question." [Strategy for “Which is better” questions](https://meta.stackexchange.com/q/204461) There is no "best". What is your definition of "better", in enough detail that others would agree on valuations? And then how are you stuck evaluating for yourself? – philipxy May 09 '23 at 19:06

0 Answers0