Advantage of splitting a table

Question

My question may seems more general. But only answer I got so far is from the SO itself. My question is, I have a table customer information. I have 47 fields in it. Some of the fields are optional. I would like to split that table into two customer_info and customer_additional_info. One of its column is storing a file in byte format. Is there any advantage by splitting the table. I saw that the JOIN will slow down the query execution. Can I have more PROs and CONs of splitting a table into two?

Is this table *normalized*? Could you list the optional columns by name/purpose? — Beryllium, May 31 '13 at 07:17

score 5 · Accepted Answer · answered May 31 '13 at 05:36

I don't see much advantage in splitting the table unless some of the columns are very infrequently accessed and fairly large. There's a theoretical advantage to keeping rows small as you're going to get more of them in a cached block, and you improve the efficiency of a full table scan and of the buffer cache. Based on that I'd be wary of storing this file column in the customer table if it was more than a very small size.

Other than that, I'd keep it in a single table.

Aheho · Answer 2 · 2013-05-31T12:54:42.400

I can think of only 2 arguments in favor of splitting the table:

If all the columns in Customer_Addition_info are related, you could potentially get the benefit of additional declarative data integrity that you couldn't get with a single table. For instance, lets say your addition table was CustomerAddress. Your business logic may dictate that a customer address is optional, but once you have a customer Zip code, the addressL1, City and State become required fields. You could set these columns to non null if they exist in a customerAddress table. You couldn't do that if they existed directly in the customer table.
If you were doing some Object-relational mapping and your had a customer class with many subclasses and you didn't want to use Single Table Inheritance. Sometimes STI creates problems when you have similar properties of various subclasses that require different storage layout. Being that all subclasses have to use the same table, you might have name clashes. The alternative is Class Table inheritance where you have a table for the superclass, and an addition table for each subclass. This is a similar scenario to the one you described in your question.

As for CONS, The join makes things harder and slower. You also run the risk of accidentally creating a 1 to many relationship. I.E. You create 2 addresses in the CustomerAddress table and now you don't know which one is valid.

EDIT: Let me explain the declarative ref integrity point further.

If your business rules are such that a customer address is optional, and you embed addressL1, addressL2, City, State, and Zip in your customer table, you would need to make each of these fields Nullable. That would allow someone to insert a customer with a City but no state. You could write a table level check constraint to cover this situation. But that isn't as easy as simply setting the AddressL1, City, State and Zip columns in the CustomerAddress table not nullable. To be clear, I am NOT advocating using the multi-table approach. However you asked for Pros and Cons, and I'm just pointing out this aspect falls on the pro side of the ledger.

I don't follow the declarative integrity argument there -- can you explain further? If you're talking about check constraints I would have thought that the single table would have the advantage as it would give greater flexibility over the combinations of columns that you could include in a constraint. — David Aldridge, May 31 '13 at 05:32
If your business rules are such that a customer address is optional, and you embed addressL1, addressL2, City, State, and Zip in your customer table, you would need to make each of these fields Nullable. — Aheho, May 31 '13 at 12:43
Ah I see -- I don't think that it would be too controversial to put something like that in its own table — David Aldridge, May 31 '13 at 16:03

score 2 · Answer 3 · edited May 23 '17 at 12:19

I second what David Aldridge said, I'd just like to add a point about the file column (presumably BLOB)...

BLOBs are stored up to approx. 4000 bytes in-line¹. If a BLOB is used rarely, you can specify DISABLE STORAGE IN ROW to store it out-of-line, removing the "cache pollution" without the need to split the table.

But whatever you do, measure the effects on realistic amounts of data before you make the final decision.

¹ That is, in the row itself.

Advantage of splitting a table

3 Answers3