Database design for many columns of same data patterns

Question

I apologize if I used "data patterns" incorrectly. Here's some background. I'm porting an Access database to a web based MYSQL app. Here's what we are tracking.

We've got a machines with up to 16 heads. Each head has three items associated with it two are integers, one is a short text string. Each production order uses at least one head. Some use all 16, some use only one. If more than one head are used, we track the order they are used in. Each production order has a few short to medium length fields in addition that are stored as well. The vast majority of production runs use less than half of the given heads.

Currently the data is in an Access database that stores everything in one table, so there are 6 + (16*3) 48 fields stored per row for a total of 54 columns. The only fields that are searched in are second two, which are integers.

I aware that there is a lot of dead space in there, since each row contains 16 elements that could be broken into a separate table and joined to display results. It's been acquiring data for about 10 years, right now the file size of the Access DB is 60.8 MB

Here's my question. Is there any real world advantage to normalizing (maybe not correct usage) in this case since none of that data is used for searching, and having it all in one column is sort of a natural state for that information?

If it has taken 10 years to reach 60mb, I wouldn't really worry about optimization for the sake of saving space. In 100 years it would still fit on a $5 usb drive. — bumperbox, Oct 29 '12 at 22:15
Are the three attributes per head constant? (do they depend only on the head_number?), or can the vary on a per_order basis? — wildplasser, Oct 29 '12 at 22:40
@wildplasser: three attributes are a color, and two measurement units, specific to that particular job. Specific head is not important, only the order it is applied in. — Randy Kilwag, Oct 30 '12 at 20:02

score 1 · Answer 1 · answered Oct 29 '12 at 19:52

Yes, there are real-world advantages, but I don't think they are enough to warrant modifying your existing Access schema. Rather, I would put my energy into migrating to a better platform, if possible, e.g., web-based with SQL Server back-end. You can worry about the schema while doing that migration.

A normalized schema will help with things like:

Data integrity: ensure same head or head spec not used twice for same machine (unless that is valid, of course...)
Querying: easily count what is the most commonly used headspec

You can do those things with the schema you have now, it just takes a litle more more work. But, this schema has been working for 10 years, so what is the business case for change?

web-based MYSQL is where it's headed. Current structure sort of takes care of your first bullet point by itself. The exact head isn't important, only the order they were used. You'd have to intentionally enter them out of order or accidentally skip one to screw it up. As for the second bullet point, I can't think of an instance where that would ever be useful to know. Never say never though. Thanks for your response. — Randy Kilwag, Oct 29 '12 at 20:21

score 1 · Answer 2 · edited May 23 '17 at 12:12

I aware that there is a lot of dead space in there, ...

Not really. I'm not privy to how Access does it, but most databases are fairly efficient in storing NULLs (typically one byte but may be as low as one bit, as it is in the case of MS SQL Server).

...since each row contains 16 elements that could be broken into a separate table and joined to display results. It's been acquiring data for about 10 years, right now the file size of the Access DB is 60.8 MB

You didn't say how many rows accumulated over these 10 years, but 60.8 MB is peanuts in database terms, even for a "light" database such as Access.

Space is not your problem and since the whole database easily fits into memory of today's hardware (or even the hardware from 10 years ago), speed is probably not your problem either.

Here's my question. Is there any real world advantage to normalizing (maybe not correct usage) in this case since none of that data is used for searching, and having it all in one column is sort of a natural state for that information?

The advantage (of splitting to two tables engaged in 1:N relationship) would be better flexibility in case you need to support different machines with different number of heads. Also, writing queries that search, sum or average the data over all heads might be simpler.

The disadvantage would be more space required (since the child table would need to store the copy of PK values from the parent table) and more need for JOINing.

All in all, your existing design looks fine to me. Are there any specific problems you have not mentioned in your question that you are trying to solve?

No specific problems, just trying to figure out the best course of action since the web/mysql conversion will likely remain in place for the next decade. So far there are about 65k rows. Each row supports different machines with different # of heads, we just don't populate unused heads. The main purpose is tracking the sequence order to recreate the job on any machine. I wanted to see if there was any real benefit to splitting the data into two tables. This seems to be a rare case where best practice is leaving well enough alone. Aside: Raw data from Access converted to csv is under 15MB! — Randy Kilwag, Oct 30 '12 at 18:00
@RandyKilwag _"Raw data from Access converted to csv is under 15MB!"_ I'd be surprised if the difference to 60.8 MB was due to NULLs - more probable reason is simply the overhead of database structures such as indexes and fragmentation. Have you tried [compacting](http://stackoverflow.com/a/74537/533120) your database? Also, are you using fixed-width types (i.e. CHAR vs VARCHAR)? Some databases store NULLs in fixed-width types inefficiently... — Branko Dimitrijevic, Oct 31 '12 at 00:16

Database design for many columns of same data patterns

2 Answers2