3

Should this be represented in the database as 1 table or 3 tables? I and my friend have different opinions about this so I'd like to see the general views on this. (Maybe it should be a vote for either solution?)

Create Table Order
// Basic fields of the table
 - ID (Primary key)
 - CustomerID  (integer, with a FK)
 - Quantity
 - ProductID  (integer, with a FK)

 // Then depending on user selection, either these fields need to be specified 
 // (could be factored out to a separate table):
 {
 - InternalAccountID (integer, with a FK)
 - InternalCompanyID (integer, with a FK)
 }

 // Or these (could be factored out to a separate table):
 {
 - ExternalAccountNumber (free text string)
 - ExternalCompanyName (free text string)
 - ExtraInformation (free text string)
 }

1 table approach:

Pros:

  • performance (one insert as opposed to two, FK check, no joins)
  • probably takes less space (the extra tables have overhead + indexes + extra ID field)
  • one table as opposed to three
  • hardly justifiable to have split out to new tables just for 2+3 fields (or what?)

Cons:

  • Nullable fields
  • Potentially extra "type" column (can be skipped)
  • Breaks 3NF (?)

Pros and cons kindly requested as well as personal opinions. :)

EDIT: I tried simplifying the example by using different entities than I am actually using so any suggestions to altering the model wouldn't really help me. I.e. focus on the technical aspects more than the domain model please.

AtliB
  • 1,263
  • 1
  • 18
  • 30
  • This question has been asked too many times already. Please have a look at some of the reponses to questions in this list http://stackoverflow.com/search?q=%5bdatabase-design%5d%20%22multiple%20tables%22&tab=relevance . If you still want to ask your question, please explain why your scenario is special. – APC Jul 12 '10 at 13:30
  • Well, I feel it's a bit different. 1) The above is a bit specific (few columns as opposed to some silly 100+ columns table) 2) It would not make sense to structure this with any inheritance (as it's not) 3) It's a pure 1:1 mapping 4) The data in "side" tables won't ever be linked to or used elsewhere (as opposed to the "how to build a Comments table" discussion) – AtliB Jul 12 '10 at 22:50
  • This one is however a close cousin: http://stackoverflow.com/questions/1290721/database-table-with-just-1-or-2-optional-fields-split-into-multiple-tables – AtliB Jul 12 '10 at 23:00
  • It appears that the model is already broken anyway... Order has a CustomerID, so all of InternalAccountID, InternalCompanyID, and ExternalAccountNumber, ExternalCompanyName, ExtraInformation, probably are duplicated from some Customer table... – pascal Jul 13 '10 at 05:16
  • >It appears that the model is already broken anyway< Nah, not really because a customer could have multiple accounts. Anyway, this is an fictitious problem (so the column names are made up). The real question is when to leave the fields in the table or break it out into a new table (in this case 2 additional tables since these are mutually exclusive fields) – AtliB Jul 13 '10 at 12:31
  • Performance isn't necessarily a pro here. INSERT/UPDATE may eventually suffer as has to rebuild indexes on a single bloated table instead of 3 tightly-defined ones. Varies per database & table type and probably won't matter until the table is huge, but nothing comes without side effects. – tadamson Jul 29 '10 at 06:11
  • If customer can have multiple accounts, why not associate account with a customer, then associate order with an account? Order would then be transitively associated with a customer (assuming that customer must have at least one either internal or external account). – RMorrisey Aug 03 '10 at 01:21
  • @RMorrisey, yes, it would be possible to remove the customerId like you suggest. But this is a made up example (which I see is not perfect as such) - but I'm more interested in the 1 table vs. 3 tables dilemma. – AtliB Aug 03 '10 at 10:36

7 Answers7

4

Hope this is self-explanatory.

order_model_v1

Damir Sudarevic
  • 21,891
  • 3
  • 47
  • 71
  • Thanks for the diagram! Yes, this is how you could set up the logical setup for the tables (as well as physical sure). But when realizing this design, should order+internalorder+externalorder be 1 or 3 tables and why? In principle there isn't actually an inheritance relationship between these entities (at least not from a real-world perspective) – AtliB Aug 03 '10 at 10:43
  • Over here it's shown as three, Order table having all common columns, and Internal and External Orders being some special sub-type of an order. You could probably move everything to one table by refactoring your Company and Accounts tables to hold internal and external companies. – Damir Sudarevic Aug 03 '10 at 11:01
3

My opinion would be that if

 // Then depending on user selection, either these fields need to be specified 
 // (could be factored out to a separate table):
 {
 - InternalAccountID (integer, with a FK)
 - InternalCompanyID (integer, with a FK)
 }

 // Or these (could be factored out to a separate table):
 {
 - ExternalAccountNumber (free text string)
 - ExternalCompanyName (free text string)
 - ExtraInformation (free text string)
 }

are always 1:1 with an order (i.e., you can't have 3 accountIDs), then leave it as one table. To take care of your null issue, you could add one more column called InternalCustomer (boolean) or CustomerType (varChar) that you could use to define an internal or external customer to know which of the two sets of fields you should look at for a specific customer.

Since we don't know the full use of this data or the schema for the entire DB, any response on this can't really be fully qualified.

Tommy
  • 39,592
  • 10
  • 90
  • 121
  • yes, it's a pure 1:1 relationship and this extra data only relates to the order itself (no other objects need this or this kind of data) – AtliB Jul 12 '10 at 22:53
0

If you want to avoid data duplication, you should go with a 2- or 3-table solution. For example, if you have the External columns in the Order table, value could exist multiple times. If the data looks like this:

ID   ExternalCompanyName
1    ACME
2    ACME
3    My Company
4    ACME

Now, if ACME changes names to ACME, Inc. you must update many rows. If the tables are normalized, where external companies are in a separate table, you would update one row. Note, there may be an argument to put Account Number in it's own table, but we'll leave that for extreme normalization.

It doesn't appear to be a 1-to-1 relationship between an order and a company/account, unless each company/account can only have one order. it sounds more like a 1-to-many relationship.

Now, what happens if a mistake is made when updating the ExternalCompanyName in a single-table environment, and only some of the rows get updated. You have some rows with ACME and some rows with ACME, Inc. You end up with a bad-data situation.

Also, if this is really a 1-to-many relationship, you really aren't saving space. You are duplicating data in the order, rather than storing it once in another table.

bobs
  • 21,844
  • 12
  • 67
  • 78
  • Thanks, but the purpose of the External** columns is actually to have ad-hoc information that should not change. If I wanted to reuse some data, I would set up an internal account and use those instead. – AtliB Aug 03 '10 at 10:50
0

As the volume increases selection from two tables may be a lot faster than one. Sometimes this kind of refactoring (partition) is done on mature databases to increase performance.

Imagine using this for a multi table join, where some criteria are on this table, but others are in different tables.

select from order join customer using (customer_id)
where
    order.order_date between ? and ?
    and customer.name = ?

It may end up fetching all order rows for the dates from disk, then throwing many of them away because they don't match the join. This fetch from disk is bound to be slow and may to spoil your RAM cache.

select from order join order_detail using (order_id) join customer using (customer_id)
where
    order.order_date between ? and ?
    and customer.name = ?

In this case when it loads all order rows from disk it's not going to hurt as bad as previously, because the table is narrower and smaller. It doesn't need to load all the lengthy fields which are irrelevant for filtering. Eventually, after join to customer, it will only fetch those order_detail rows which match all criteria.

If you expect this to be large, you should consider splitting the table so that the fields which are most critical for searches are in one table, and "data" fields in other one-to-one table(s).

The bottom line is: Normal form and domain is one thing, but performance often requires tradeoffs. You can hide some of them (cover the split with a view), but not all (duplicate/aggregate fields for the sake of faster selects).

Konrad Garus
  • 53,145
  • 43
  • 157
  • 230
  • In my sample, I would actually always filter on the base-fields - never from the "side-tables". So searching fields would most probably be more expansive with 2 or 3 tables than 1. – AtliB Aug 03 '10 at 10:53
  • Right, so that's why 2-3 tables is likely to be faster. – Konrad Garus Aug 03 '10 at 11:02
  • Well, normally you would be going through an index (which is a lot slimmer than the table itself). From the index you would get the "row locator" and get the entry directly from the single table. If there were more tables, that would be another fetch thus slowing things down. I can't really imaging what might make it faster with multiple tables (assuming properly indexed tables). – AtliB Aug 03 '10 at 12:51
  • Did you read my example? That's right, index helps you locate the rows. But if you have more complex queries (using other fields, or joins, which don't use indexes), it may fetch more rows from disk than necessary for the additional criteria. In this case the narrower the table the better. – Konrad Garus Aug 03 '10 at 13:18
  • Sure did, and I said that "I would actually always filter on the base-fields" which are pretty selective. But I agree that if you filter by a non-indexed column (and are therefore forced to take a table scan hit), then sure it would probably be better. – AtliB Aug 03 '10 at 13:53
  • Even when you filter on an indexed column, planner may choose not to do an index scan. Especially if you filter on a few indexed columns at a time, it may choose to fetch all rows for one index and then scan them for additional criteria. One situation when this often happens is when you do an `OR` or `IN` with many alternatives, or `JOIN`. – Konrad Garus Aug 03 '10 at 14:21
0

I would absolutely not go with the 3-table solution. By breaking this data into 3 tables, you really can't have any queries return a full order header without joining with the foreign key, and every insert of a new order updates multiple tables and indexes, which is a problem for concurrency. I would suggest using 2 tables, one for InternalOrders and one for ExternalOrders. For those cases where you need a consolidated query of data from both sets of orders, define a view that is the union of both tables.

I am surprised to see product id and quantity as part of the order header. Every order tracking database I've ever seen breaks out order items as a separate table, using the order id as a foreign key, so that a single order can include multiple products (or the same product with various quantities, delivery times, etc.).

PaulMcG
  • 62,419
  • 16
  • 94
  • 130
  • Regarding you last point: I tried simplifying the example by using different entities than I am actually using so any suggestions to altering the model wouldn't really help me. – AtliB Aug 03 '10 at 10:49
  • There are no problems in concurrency updating multiple tables at once, since there are database transactions, don't forget them. – Illarion Kovalchuk Aug 03 '10 at 11:50
  • @Shaman - Yes, I'm familiar with the concepts of transactions. The concurrency issue is that, for the duration of my transaction, I may be blocking other innocent bystanders who aren't really in my way, but get held up because I'm updating some index block that we happen to share. The more tables and indexes I update within my transaction, the more likely I am to get in someone else's way. *That* is what I mean by "concurrency problem." – PaulMcG Aug 03 '10 at 19:26
0

I'm no purist, so 3nf is good when it makes sense...but you don't have to take it for granted that it always will.

From a pragmatic standpoint, what is your goal? Your pros/cons list is a good start. I'd add a few more ideas to the list -- as you see fit.

1) Will any other table in your database need to relate (e.g., join) to this data? That's kind of the point of an RDB.

2) Will your database grow? Even if 1 table makes sense now, will it always make sense? You'll regret it, if you find yourself wanting to add more tables, and your non-normalized table is forcing you to 'work-around' it, dealing with extra rows returned, slower execution times, etc.

3) What happens when your customer gets a new external account, or what have you. Will you create a brand-new record? How will you answer questions like "What is customer so-and-so's account number?".

...

I think in general, i go for scalable, which in this case may mean 3nf. 1 table is easier to deal with in a very narrow scope, but if anything ever changes, you'll be dealing with "How do I split this table into properly related 3nf tables, without messing up all of the dependencies that have been created on it?". That one is no fun.

dave
  • 1,344
  • 10
  • 16
  • 1) no, no other table will need to related to this table / these tables. 2) yes, but not significantly (maybe one million rows in a few years). 3) The external accounts are thought of as "ad-hoc" accounts so they never need to be used again. If you want a permanent one, you would set up an internal account. – AtliB Aug 03 '10 at 10:40
  • OK -- If I understand what I've been reading so far (here and in other answers), you would do well with 4 tables -- here's why: 1) Everyone needs a customer ID (both internal and external) so they can be associated with an order. So a "customer" table is required. 2) As the purpose is to track orders, so a "orders" table is requored. 3) Presumably, your "internal" customers will re-use their accounts for future orders, so you'll need to be searching that table a lot when they log-in, or however you set it up. So an "internal accounts" table will be important. That leaves "external"... – dave Aug 03 '10 at 15:54
  • ...accounts. I have some question as to why you'd need to go through the trouble of creating external account numbers. (what will they be used for -- will you search for them in the future?) External company name can go in the customer table, as well as extra info. Anyway, I'd either put external customer info in the customer table entirely, and just implicitly "know" they're external because they don't have an internal account associated with them. If you must have "account" infofor your ext. customers, an "external accounts" table would lighten the load for searches on the other tables. – dave Aug 03 '10 at 15:58
0

Is account information associated with the customer before he can order (i.e. you have enother table where you track which account ID-s given CustomerID can use)? Can you abstract all accounts to a reasonably uniform schema (that one can have a few nulls) fo that you have one universal AccountId (surrogate key) and then Account's table has say 3 varchar fields and one that tracks the kind of the account (used for billing etc.) ?

If you can do that then your order tracks just one AccountID since the order (as an entity) really doesn't care which payment method was used - it only cares that it's a legit/existing/approved AccountId for that user. Everything else is someone else's business so to speak (billing or checking funds etc.) and that enity and it's processing will need more data anyway.

This keeps your Order clean and null-free and facilitates separation of concerns as well.

Conceptually, your Order is really so called fact table - carrying only numbers and FK-s, small in item size but with a huge number of them.

So:

 Table Order (
     - OrderId
     - Quantity
     - ProductId
     - DiscountId -- sonner or latter :-)
     - AccountId
     - PaymentStatus -- probaly FK as well or predefined constant
 )

 Table Account (
     - AccountId
     - BillingInfo  -- akka ext acct number as text
     - PrincialName -- akka ext company name, some equivalent for internal acct-s
     - AdditionalData
 )
ZXX
  • 4,684
  • 27
  • 35