1

Suppose I have database tables called Customers, Countries, States and Cities. An individual customer would have a CityID field, which links to the Cities table. Each city would have a StateID field, which links to the states table, and similar for countries.

This is all straightforward when we know the full address of the customer, but sometimes we only have their country, and sometimes only the sate. We don't always have the city.

How do we handle this? We want to be able to save the country, but can't if we don't know the city.

We could add StateID and CountryID fields to the customers table, but that smells like poor design, and could lead to inconsistent data.

Anyone any suggestions? I'm sure this is a pretty standard question, but I can't find a good answer to it.

P.S. In answer to Jaffar's comment below, the reason for doing this is that we need to do some analysis of where our customers are distributed. The client sells hugely expensive medical scanners to hospital groups, and doesn't always know which site will take the scanner when it's ordered. Therefore, we need to be able to specify as much info as we can, which may only be the country, may be the state, or may be the city.

We currently only need to do this for the US, but would prefer to provide a flexible approach in case the client wants to expand the analysis to other countries.

Avrohom Yisroel
  • 8,555
  • 8
  • 50
  • 106
  • Can you elaborate about your program? If you need them then why they arent required? If they are required then why do you have this problem? – Jafar Kofahi Aug 06 '13 at 15:16
  • Addresses are a political thing. You can't expect politics to have "good design" – Richard Aug 06 '13 at 15:24
  • Related: [SQL Database Design Best Practice (Addresses)](http://stackoverflow.com/q/7639637/425809) – Richard Aug 06 '13 at 15:28
  • You might considering asking over on [DBA Stackexchange](http://dba.stackexchange.com/). They are experts over there in database design. – Richard Aug 06 '13 at 16:25

4 Answers4

2

If you look at any data model pattern book, you will find that they abstract geopolitical areas.

Use table inheritance.

Country extends Geopolitical Area, and so does State/Province, County, City (though not Postal Code or Continent).

You can now point a customer at any Geopolitical Area using one column with a foreign key. If you point it at a city, you can derive the state, country. If you point it at the country, then at least you know the country.

This is also useful for tracking tax rates by county, state, country.

Neil McGuigan
  • 46,580
  • 12
  • 123
  • 152
1

As Richard suggested, I asked on dba.stackoverflow.com, although I posed the question slightly differently. I suggested three solutions, the three-table approach (that Richard favours), a self-referencing Locations table approach (that I thought was going to be complex to query) and Juhana's approach of using three tables, but including a blank entry (which seemed to me to be the easiest).

Following the two replies there (follow the link if you want to see them in full), I tried the self-referencing Locations table approach, and found it to be much easier than I had thought. It has the most flexibility of all approaches, as it allows me to link to any level of region, including extra levels not considered yet, doesn't require multiple links from the customer table, but doesn't involve complex queries.

I don't know if it would be as easy using this approach with pure SQL data access, but as I'm using an ORM, the child locations were materialised as a collection property on the entity, making navigation really simple.

Thanks to everyone who replied. I hope this helps someone else.

Community
  • 1
  • 1
Avrohom Yisroel
  • 8,555
  • 8
  • 50
  • 106
0

Make one row per state in the cities table where the name of the city is empty, and one row per country where the city and state are empty. Use them when the city or state is unknown.

JJJ
  • 32,902
  • 20
  • 89
  • 102
  • What if you need a null city *and* null state? – Richard Aug 06 '13 at 15:21
  • @Richard "one row per country where the city and state are empty". How can you have a null state but not null city? – JJJ Aug 06 '13 at 15:36
  • @Juahana Vatican City has a city, but no state or country. (Or a country with no city or state.) Also, [cities with countries and no states](https://drupal.org/node/636464) is a common problem. Guam is an example. – Richard Aug 06 '13 at 15:46
  • I had thought of this, and it is an approach I'm still considering. The empty rows wouldn't be shown in the GUI, but could be used to keep a good database structure, whilst allowing the unknown data. I was just hoping their was another way of doing that didn't involve having to handle placeholder entries. Thx for the suggestion. – Avrohom Yisroel Aug 06 '13 at 15:53
  • @Richard I still don't see the problem; if a country has no states, all its state fields are null in any case. If a country has states, the city itself is meaningless ("Springfield, USA") without the state so it would be mandatory field *in those countries* anyway. – JJJ Aug 06 '13 at 15:56
  • @Juhana The problem is with the database design. It's three tables not one: [CityID, CityName, StateFK] [StatID, StateName, CountryFK] [CountryID, CountryName]. If you want City and Country, there's no way to leave State out of the mix. The original is poor design and your answer doesn't account for that. – Richard Aug 06 '13 at 16:02
  • The design requires that the city be tied to a state. That would require 1 City row with a blank name *per state*. Then 1 State row with a blank name *per country*. So, that adds ~200 Blank states for the ~200 countries. Then a large number of blank cities for each state for each country. How useful of a lookup table is that? – Richard Aug 06 '13 at 16:07
  • @Richard *"Then a large number of blank cities for each state for each country."* --- no. **One** blank city for each state. For example in the US that would mean 50 extra rows (+1 for blank state = 51 in total). Since there aren't that many countries with states, in total there might be about 500 extra rows. In a table that has (potentially) all the cities of the world (tens of thousands rows?) that's practically nothing. – JJJ Aug 06 '13 at 16:10
  • @Juhana True. But if you're going to do that, why not just merge the data into a single table [LocationID, CityID, StateID, CountryID] and allow nulls? That way you can have [xxx, NULL, NULL, 1234] and it would be a lot easier to understand (on every level) than having to scroll past 500 rows of blanks. In a relational database, forcing blank rows is poor desing. – Richard Aug 06 '13 at 16:20
  • Richard, I don't know what you mean by "scrolling past blank rows," in a relational database, you select what you want based on query criteria, you don't scroll through every row looking for it. This approach would work very well when the data is to be used via an ORM in code, as it allows really easy navigation between entities. As I mentioned earlier, my main objection to your suggestion is that you are repeating data, and opening yourself up to invalid data if someone adds a city that's in a different state. Your suggestion has no way of protecting against that, whereas this method is safe. – Avrohom Yisroel Aug 08 '13 at 22:50
0

I think your design is wrong. You should not require a CityID in order to get a state or a StatID in order to get a Country.

What if people live in a part of a state that isn't in a city? What if they live in part of the country that isn't part of a state (or in a country that doesn't have states)? What if they live in a CityState, where the city,state, and country are all the same?

[UserID]
[Address Line 1]
[Address Line 2]
[Address Line 3]
[Address Line 4]
[CityID]
[StateID]
[CountryID]
[Zip Code]

Make them all Foreign Keys.

Also, if you're trying to be as flexible as possible, requiring a City, State, and Country is definitely the opposite of that approach.

A great post on the subject lives here: Is there common street addresses database design for all addresses of the world? Also, valid combinations exist for city, country, and no state.

[Edit]
Since there are so many nay-sayers to what should be standard database design...

Address to Vatican City (No City, No State)

His Holiness Pope Francis
Apostolic Palace
Vatican City

Address to Hard Rock Cafe, Guam (No State)

Hard Rock Cafe Guam
1273 Pale San Vitores Road
Tamuning, Guam 96913

The only way to represent these two addresses is to have three, non-linked fields:

[City] [State] [Country]

This is the standard design for an address table.

Community
  • 1
  • 1
Richard
  • 6,215
  • 4
  • 33
  • 48
  • Pls see myn updated question for what we are trying to do. It's not a case of requiring a CityID to get the others, more that once you have a CityID, you don't need anything else, so adding all three IDs to the customers table is duplication, and can lead to inconsistent data. If we always had country, state and city, then we would only need the CityID in the customers table, and we could get the rest by following links. Our problem is that we don't always know the city of state initially, but want to record the country. Don't know if that changes anything. Thx for the reply. – Avrohom Yisroel Aug 06 '13 at 15:51
  • @AvrohomYisroel And that is exactly why your design will not work! You cannot know if a state will exist or a country or a city. You might have all of them, you might have any one or two of them. Plus, you **cannot** rely on having a State. What about Guam? What about Puerto Rico? What about Vatican City? Trust me. You want *flexibility* not rigidity. – Richard Aug 06 '13 at 16:09
  • Ah, I see where you're coming from, although I'm still not sure that I like the idea of linking all three fields from the customer. I came across an interesting idea of using a self-referencing table, which gives you even more flexibility, as you can have as many (or as few) levels of country/state/city/etc as you like. It's a bit more work, but very flexible, and it has the advantage of forcing data integrity, as you can't have the customer linked to a city which is a different state from the one lined to directly. Need to try both approaches on a sample project and see. Thx – Avrohom Yisroel Aug 06 '13 at 16:32
  • @AvrohomYisroel You also have to keep in mind that there are *numerous* duplicates: [Springfield](https://en.wikipedia.org/wiki/Springfield) is a classic example. It would be better just to leave the Cities, States, and Countries tables as lookup tables rather than source tables. – Richard Aug 06 '13 at 16:50
  • What do duplicates have to do with it? I don't care if there are multiple Springfields, as long as they are linked correctly, there wouldn't be a problem. – Avrohom Yisroel Aug 08 '13 at 22:52
  • @AvrohomYisroel It's hard to pick the correct city when you bind a drop-down list to a table with duplicates. Granted, it's unlikely to happen if you design the dropdown correctly. However, good database design should be *flexible*. – Richard Aug 12 '13 at 11:21
  • Where do you get this idea of duplicates from? If I bind a dropdown to the countries, and the user chooses one, the second dropdown will only be populated with states in that country. Similarly, once they've picked a state, the third dropdown will only be bound with cities in that state. It doesn't matter if there is a Springfield in every state in every country of the world, the user can't possibly pick the wrong one. As it happens, this is academic now, as I mentioned in my reply, I tried using a self-referencing table, and it was really easy and hugely more flexible than any other approach. – Avrohom Yisroel Aug 13 '13 at 13:57
  • @AvrohomYisroel True, it is academic. :) The design you mention is why I said *"it's unlikely to happen if you design the dropdown correctly"*. However, this design fails in some cases--specifically those where there are no states. You can still design around those issues, but the design starts getting more complicated quickly. The Geopolitical regions answer is probably the best solution. It solves for all the problems. – Richard Aug 13 '13 at 15:12