1

I am re-creating a part of my company’s database because it does not meet future needs. Currently we have mainly a flat file and some disjoined tables that were never fully realized.

My way of thinking is we have a table for each category except maybe the zips table, which may serve as a connect it all together table. Please refer to image below:

Database Diagram http://www.freeimagehosting.net/uploads/248cc7e884.jpg

One thing I am thinking of is removing the zip table and just putting the zip code in the zipstocities table since the zip code is almost unique and then indexing the table on the zip code. The only downside is zip code has to be a varchar to take care of zip codes with leading zeros. Just want to know if there is a flaw in my logic.

Juha Syrjälä
  • 33,425
  • 31
  • 131
  • 183
Sean
  • 517
  • 4
  • 11
  • 34
  • 5
    What is this database to be used for? As it is this schema is over-normalised I would say. – Lazarus Jul 22 '09 at 16:13
  • It is used for adding patient information chief scenario: patient knows their zip code the remaining all other fields auto populate Possible other scenario: Patient only knows city and state they live in; zip code and county is then auto populated – Sean Jul 22 '09 at 16:40
  • Over-normalised? I think it is completly denormalised. – Daniel Brückner Jul 22 '09 at 16:46
  • 2
    Interesting article that might give you ideas: http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/sql-server-2008-proximity-search-with-th – djangofan Jul 22 '09 at 16:59

8 Answers8

4

I don't know the US ZIPcode and territorial devision system well, but I assume it's somewhat like the German one.

  • A state has many counties.
  • A county has many cities.
  • A city has many zip codes.

Hence I would use the following schema.


ZipCodes          CityZipCodes
------------      ----------------      Cities
ZipCode (PK) <─── ZipCode (PK)(FK)      -----------
                  City    (PK)(FK) ───> CityId (PK)
                                        Name
                                        County (FK) ───┐
                                                       │
                                                       │
                                     Counties          │
                                     -------------     │
              States                 CountyId (PK) <───┘
              -----------------      Name               
              StateId      (PK) <─── State    (FK)
              Name
              Abbreviation

Fixed for multiple cities per ZIP code.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Daniel Brückner
  • 59,031
  • 16
  • 99
  • 143
  • 1
    occassionaly though a zip has more than one city – Sean Jul 22 '09 at 16:45
  • In general, yes, but sometimes a zip code is shared between a city with one zip code and another with several - 94303 is one such, and the cities are in two separate counties, to boot. My address is in Palo Alto, CA 94303, which is in Santa Clara county; 94303 also covers (some of?) East Palo Alto, which is in San Mateo county. When entering my address into web forms, I periodically have to choose which county my zip code is in because of this. – Jonathan Leffler Jul 23 '09 at 00:19
  • 1
    @Jonathan This should cause no representation problems. 94303 can be assigned to Palo Alto and East Palo Alto using the CityZipCodes table. Both cities are then related to a county and this relation is independent from the zip code. – Daniel Brückner Jul 23 '09 at 09:20
2

One thing you should be aware of is that not all cities are in counties. In Virginia you are in either a city or county but never both.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • 1
    I really did not want to hear that – Sean Jul 22 '09 at 16:48
  • What can I say, it was a state law they created to fight the desegration of schools. – HLGEM Jul 22 '09 at 17:46
  • BTW the source we get zip code data from puts city, state and zip all in the one table. You could add county (make it nullable) and then it is easy to query. Make a unique index on all 4 fields. – HLGEM Jul 22 '09 at 17:48
0

Looking at the diagram you have, the state table is the only one of the 4 outside tables that is really necessary. Lookup tables with just an ID and a single value aren't worth the effort. These relationships are designed to make a single value in the main table (ziptocities) refer to a set of related data in the lookup table (states).

Mentee
  • 51
  • 2
  • yes but a city may severl zip codes and a county may also have several cities. Also sometimes a zip has more than one city – Sean Jul 22 '09 at 16:35
  • let me try once again this time I will use english: yes, but a city may have several zip codes and a county may also have several cities. Also sometimes a zip has more than one city – Sean Jul 22 '09 at 16:42
  • I agree with that. However, with the ziptocity table you are mapping zips to a city, county, and state. My point is simply that if you are going to refer to zipID that is a placeholder for a single piece of information (i.e. zip code), why wouldn't you simply just use that single value instead of storing the value and a reference value. By using the reference method you use 1 entry for each row in ziptocities plus 1 entry for each possible value of zip. If you simply use the zip code value, you only use 1 entry for each row in ziptocities. This eliminates size and complexity from you DB. – Mentee Jul 22 '09 at 16:52
0

You'll need to ask yourself why you care about counties. In many states in the US, they have little importance beyond tradition and maps.

The other question will be how important will it be that the address be accurate? How many deaths will there be if important letters are not delivered in a timely manner (possibly many if the letter is about prescription drug recalls!)

You probably want to think about using data from the Postal Service, possibly using a product that corrects addresses. That way, when you get a good address, you'll be certain the mail can be delivered there - because the Postal Service will have said so!

John Saunders
  • 160,644
  • 26
  • 247
  • 397
  • Add them and make them optional. Remind him that they are not containers - they're not part of this hierarchy, if he's thinking that way. – John Saunders Jul 22 '09 at 17:11
0

There seem to be flaws in both your process and your logic.

I suggest that you stop thinking about tables and relationships for a moment. Instead, think about facts. Make a list of valid addresses that your database needs to support. Many surprises await you.

Don't confuse an address with a mailing label. They're not at all the same thing. Consider modeling carriers, too. In the US, whether an address is valid depends on the carrier. For example, my PO box is a valid address when the carrier is the USPS, but not when the carrier is UPS.

To save time, you might try browsing some international address formats on bitboost.

0

Will your logic work if two countries happen to have the same zip code? These two would be pointing to different cities in that case. here are some points to consider

  1. Do you want to use zipcode as a kind of primary key into address? (at lease the city, state and country fields). In that case, you can have zipcode, city,state,country in one table. Create indexes on city, state etc.. (you have a functional dependency of the form zipcode->country,state,city . This as i said may not be true across countries.
  2. If auto populating is your only concern, create a materialized view and use it.

I would recommend reading 'Data Model patterns' by David C. Hay.

bkm
  • 973
  • 7
  • 13
  • this will never be used outside of the US as this is developed for US specific medical claims and laws. – Sean Jul 22 '09 at 18:07
  • Then drop the country details. I am assuming you have knowledge of functional dependencies. You have the following functional dependency. Zip Code -> count, city, state Do you, however, have the following Functional dependencies in your design? (County, City-Name) -> State Name (city,county, state) -> zip code Your design looks highly de-normalized which is theoretically good. You will, however have too many joins in queries.This can affect performance. Note of Functional Dependency a->b means that if for two rows R1 and R2 in the table, if R1(a) = R2(a), then R1(b) should be equal to R2(b). – bkm Jul 23 '09 at 03:38
  • I agree that county is irrelevant however the owner of the company wants them so they have become relevant. Over all I am thinking of implementing Daniel Brouckner above. Just looking now for special case scenarios that would break that logic. – Sean Jul 23 '09 at 19:52
0

But not every person who has a valid medical claim is required by law to remain in the US until the claim is settled. People move.

  • That concern is irrelevant as we are not truly trying to track patients just substantiate that at the time of treatment they were residing in the U.S. – Sean Jul 23 '09 at 19:47
0

San Francisco is a city in California; it's not a city in Alabama. Does your design prevent nonsense entries like "San Francisco, AL"?