Database Design: Explain this schema

Question

Full disclosure...Trying feverishly here to learn more about databases so I am putting in the time and also tried to get this answer from the source to no avail.

Barry Williams from databaseanswers has this schema posted.

Clients and Fees Schema

alt text

I am trying to understand the split of address tables in this schema. Its clear to me that the Addresses table contains the details of a given address. The Client_Addresses and Staff_Addresses tables are what gets me.

1) I understand the use of Primary Foreign Keys as shown but I was under the assumption that when these are used you don't have a resident Primary Key in that same table (date_address_from in this case). Can someone explain the reasoning for both and put it into words how this actually works out?

2) Why would you use date_address_from as the primary key instead of something like client_address_id as the PK? What if someone enters two addresses in one day would there be conflicts in his design? If so or if not, what?

3) Along the lines of normalization...Since both date_address_from and date_address_to are the same in the Client_Addresses and Staff_Addresses table should those fields just not be included in the main Address table?

this url http://www.databaseanswers.org/data_models/clients_and_fees/index.htm does not work on my side — Felice Pollano, Jan 11 '11 at 22:27
Thanks to whoever posted the img directly in the question. I deleted it to edit the question when having problems tying to comment only to find out that comments were being blocked by my browser. — swisscheese, Jan 11 '11 at 23:17

score 2 · Answer 1 · answered Jan 11 '11 at 22:40

1) In each of those tables the primary key is a compound key consisting of three attributes: (staff_id, address_id, date_address_from) and (client_id, address_id, date_address_from). This presumably means that the mapping of clients/staff to addresses is expected to change over time and that the history of those changes is preserved.

2) There's no obvious reason to create a new "id" attribute in those tables. The compound key does the job adequately. Why would you want to create the same address twice for the same client on the same date? If you did then that might be a reason to modify the design but that seems like an unlikely requirement.

3) No. The apparent purpose is that they are the applicable dates for the mapping of address to client/staff - not dates applicable to the address alone.

@ dportas Just looking for further clarification on what you mean here. Are the dates mapped to the clients in a way that Dani talked about in his response to this question, a person that has multiple residences during different times? Your answers to 1 and 2 really cleared up things for me. Thanks — swisscheese, Jan 11 '11 at 23:16

score 2 · Answer 2 · answered Jan 14 '11 at 23:21

2

3) Along the lines of normalization...Since both date_address_from and date_address_to are the same in the Client_Addresses and Staff_Addresses table should those fields just not be included in the main Address table?

No. But you did find a problem.

The designer has decided that clients and staff are two utterly different things. By "utterly different", I mean they have no attributes in common.

That's not true, is it? Both clients and staff have addresses. I'm sure most of them have telephones, too.

Imagine that someone on staff is also a client. How many places is that person's name stored? That person's address? Can you hear Mr. Rogers in the background saying, "Can you spell 'update anomaly'? . . . I knew you could."

The problem is that the designer was thinking of clients and staff as different kinds of people. They're not. "Client" describes a business relationship between a service provider (usually, that is, not a retailer) and a customer, which might be either a person or a company. "Staff" describes a employment relationship between a company and a person. Not different kinds of people--different kinds of relationships.

Can you see how to fix that?

answered Jan 14 '11 at 23:21

Mike Sherrill 'Cat Recall'

91,602
17
122
185

1

Almost forgot another observation--addresses don't have lines. Mailing labels have lines. – Mike Sherrill 'Cat Recall' Jan 14 '11 at 23:22
1

@Mike. Yes, and there are a hundred more problems like that. – PerformanceDBA Jan 15 '11 at 07:53
@ Mike, thanks for provoking thought here. “Can you see how to fix that?” I have been trying to think through this. After looking at the rest of the database and the content it is trying to capture initially I would think that a redesign to many of the tables would have to happen to capture what you might be suggesting. Safe to assume you are not alluding to that here? If so with further thought, I came up with adding a join table that would identify a person as a “client”, a member of the “staff”, or even both. Were you thinking of something else? Or have another suggestion? – swisscheese Jan 15 '11 at 23:57
As far as the address/shipping lines. It seems that WIlliams uses this model a lot. I also read elsewhere that Amazon follows the same type of model. Without introducing a lot of complexity are there benefits to not keeping addresses in a simple table? For me and my needs, I can only see the most complex query off this information to be how many from a certain city/state/ or country exist. – swisscheese Jan 16 '11 at 00:02
2

@swisscheese: If you mean "Are there benefits to modeling mailing labels instead of addresses?", then the answer is "Yes". If the entity your business is concerned with is mailing labels, then modeling mailing labels is a good idea. On the other hand, if your business is concerned with addresses, then modeling addresses is a good idea. Some businesses need to model both. – Mike Sherrill 'Cat Recall' Jan 16 '11 at 12:18
@swisscheese: I'm saying the same thing performanceDBA is saying below: "Customer and Staff should be normalised into a Person table, with all the elements identified." You'd need to redesign the tables that stored information about persons and clients. I answered a similar question and posted SQL here http://stackoverflow.com/questions/4688972/different-user-types-objects-own-content-in-same-table-how/4690648#4690648 – Mike Sherrill 'Cat Recall' Jan 16 '11 at 12:24
1

@swisscheese: I spoke too soon. I should have said that I'm saying almost the same thing as performanceDBA. I'm saying you probably need to implement a supertype, "Parties", and the two subtypes "Organizations" and "People" (pr "Persons"). I'd assume some clients are persons, and some clients are companies. – Mike Sherrill 'Cat Recall' Jan 16 '11 at 12:27
@Catcall, thanks for the responses and added info. My eyes are definitely opening up more and over the past couple of days I have really been taking a lot in. I will work through what you said about shipping/addresses and their differences and modeling both and see where that leads me. I need to see how things play out to get the real understanding although I understand the concept you brought to the forefront. – swisscheese Jan 17 '11 at 03:15

PerformanceDBA · Accepted Answer · 2011-01-17T00:20:39.063

Evaluation

First an Audit, then the specific answers.

This is not a Data Model. This is not a Database. It is a bucket of fish, with each fish drawn as a rectangle, and where the fins of one fish are caught in the the gills of another, there is a line. There are masses of duplication, as well as masses of missing elements. It is completely unworthy of using as an example to learn anything about database design from.
There is no Normalisation at all; the files are very incomplete (see Mike's answer, there are a hundred more problem like that). The other_details and eg.s crack me up. Each element needs to be identified and stored: StreetNo, ApartmentNo, StreetName, StreetType, etc. not line_1_number_street, which is a group.
- Customer and Staff should be normalised into a Person table, with all the elements identified.
- And yes, if Customer can be either a Person or an Organisation, then a supertype-subtype structure is required to support that correctly.
So what this really is, the technically accurate terms, is a bunch of flat files, with descriptions for groups of fields. Light years distant from a database or a relational one. Not ready for evaluation or inspection, let alone building something with. In a Relational Data Model, that would be approximately 35 normalised tables, with no duplicated columns.
Barry has (wait for it) over 500 "schemas" on the web. The moment you try to use a second "schema", you will find that (a) they are completely different in terms of use and purpose (b) there is no commonality between them (c) let's say there was a customer file in both; they would be different forms of customer files.
- He needs to Normalise the entire single "schema" first,
- then present the single normlaised data model in 500 sections or subject areas.
- I have written to him about it. No response.
It is important to note also, that he has used some unrecognisable diagramming convention. The problem with these nice interesting pictures is that they convey some things but they do not convey the important things about a database or a design. It is no surprise that a learner is confused; it is not clear to experienced database professionals. There is a reason why there is a standard for modelling Relational databases, and for the notation in Data Models: they convey all the details and subtleties of the design.
There is a lot that Barry has not read about yet: naming conventions; relations; cardinality; etc, too many to list.

The web is full of rubbish, anyone can "publish". There are millions of good- and bad-looking "designs" out there, that are not worth looking at. Or worse, if you look, you will learn completely incorrect methods of "design". In terms of learning about databases and database design, you are best advised to find someone qualified, with demonstrated capability, and learn from them.

Answer

He is using composite keys without spelling it out. The PK for client_addresses is client_id, address_id, date_address_from). That is not a bad key, evidently he expects to record addresses forever.
- The notion of keeping addresses in a separate file is a good one, but he has not provided any of the fields required to store normalised addresses, so the "schema" will end up with complete duplication of addresses; in which case, he could remove addresses, and put the lines back in the client and staff files, along with their other_details, and remove three files that serve absolutely no purpose other than occupying disk space.
You are thinking about Associative Tables, which resolve the many-to-many relations in Databases. Yes, there, the columns are only the PKs of the two parent tables. These are not Associative Tables or files; they contain data fields.
It is not the PK, it is the third element of the PK.

The notion of a person being registered at more than one address in a single day is not reasonable; just count the one address they slept the most at.
Others have answered that.

Do not expect to identify any evidence of databases or design or Normalisation in this diagram.

Ok just read through your post, thanks for the added insight. I have read both good and bad about Barry and with the detail of your explanations I understand where you are coming from. Like I said the world of DBs are new to me IT is not. Understanding this missing piece of my knowledge has really gotten my attention so I would appreciate if you could point me to any sources of good information, unfortunately the DB peeps that I do know I don't trust to learn from. But always open to new friends. Would be cool to see a well designed large scale DB example to work from (cont.) — swisscheese, Jan 16 '11 at 07:07
(cont.) Do you or anyone else know where that exists? Where someone is willing to share such information? If not on here publicly, maybe directly with me? I understand the art and experience that comes into making a well laid out data warehouse and the subjective nature surrounding it. So any books, websites, people willing to share such info please pass them my way. I could continue to read the theory and design principles but it would be really helpful to see an example of something that actually works. — swisscheese, Jan 16 '11 at 07:08
@swisscheese. You will find that "good things no cheap, cheap things no good", which is why 99% of the info available free is rubbish & why large orgs pay for professional services. I have several large, long term Dbs implemented in banking & finance, all customer confidential. I am permitted to show those that are 10 years old, but in hard copy only. You might get some value by (go to my profile and) read any answer re database; then ask questions. Here's [**one with links**](http://stackoverflow.com/questions/4702728/relational-table-naming-convention/4703155#4703155), to get you started. — PerformanceDBA, Jan 16 '11 at 23:16
@swisscheese. 1) SO is not database-driven, the posts and comments are crawled. We get notifications of comments only if the commenter uses @Handle to open the comment. 2) If you wish to, you can un-choose an Answer, then choose another Answer. 3) Db Books. Read textbooks only, Codd & Date, Inmon. Stay away from Ambler, Martin, Kimball, and (for Dbs) any OO types. — PerformanceDBA, Jan 16 '11 at 23:38
@ PerformanceDBA ...."good things no cheap, cheap things no good" isn't this so true. I hope you see that I am not looking for handouts or free work. I am however looking to expand my tech knowledge from HW/Networks/Systems/web design to more programming and especially data warehousing for reasons not easily explained here. More looking to be steered into the right direction. I must say I have read a lot from you already and again I extend a true appreciation for your inputs and thoughts. I will tackle your pdfs that you posted on the other ?, @ first glance I was very happy to see them. — swisscheese, Jan 17 '11 at 03:08
@swisscheese. My pleasure. I have no problem with your attitude. Just read as much specific info on the subject, and ask specific questions. — PerformanceDBA, Jan 17 '11 at 09:58

score 1 · Answer 4 · answered Jan 11 '11 at 22:39

This 2 extra tables enables you to have address history per one person.

You can have them both in one table, but since staff and client are separated, it is better to separate them as well (b/c client id =1 and staff id =1 can't be used on the same table of address).

there is no "single" solution to a design problem, you can use 1 person table and then add a column to different between staff and client. BUT The major Idea is that the DB should be clear, readable and efficient, and not to save tables.

about 2 - the pk is combined, both clientID, AddressID and from. so if someone lives 6 month in the states, then 6 month in Israel, and then back to the states, to the same address - you need only 2 address in address table, and 3 in the client_address.

The idea of heaving the from_Date as part of the key is right, although it doesn't guaranty data integrity - as you also need manually to check that there isn't overlapping dates between records of the same person.

about 3 - no (look at 2).

Thanks Dani for the response, your answer along with dportas really cleared up things for me. — swisscheese, Jan 11 '11 at 23:21

score 0 · Answer 5 · answered Jan 11 '11 at 22:39

Viewing the data model, i think:

1) PF means that the field is both part of the primary key of the table and foreign key with other table.

2) In the same way, the primary key of Staff_Addresses is {staff_id,address_id,date_adderess_from} not just date_adderess_from

3) The same that 2)

score 0 · Answer 6 · answered Jan 11 '11 at 22:45

0

In reference to Staff_Addresses table, the Primary Key on date_address_from basically prevents a record with the same staff_id/address_id entered more than once. Now, i'm no DBA, but i like my PKs to be integers or guids for performance reasons/faster indexing. If i were to do this i would make a new column, say, Staff_Address_Id and make it the PK column and put a unique constraint on staff_id/address_id/date_address_from.

As for your last concern, Addresses table is really a generic address storage structure. It shouldn't care about date ranges during which someone resided there. It's better to be left to specific implementations of an address such as Client/Staff addresses.

Hope this helps a little.

answered Jan 11 '11 at 22:45

Sergey Akopov

1,130
1
11
25

Sergei, I was thinking along the same lines as you here in reference to using integers as primary keys. After doing HW for so many years its time to master this programming stuff and from what I learned so far what you posted seems to be the consensus. Thanks for the input. – swisscheese Jan 11 '11 at 23:23
There's no clear justification for adding a single integer surrogate key here. Assuming the keys are indexed then adding the extra constraint would mean an extra index too. That's unlikely to be good for performance because the additional index has to be updated whenever rows are inserted/deleted. – nvogel Jan 12 '11 at 12:42

Database Design: Explain this schema

6 Answers6