140

I am a programmer and need a practical approach to storing street address structures of the world in a database. So which is the best and common database design for storing street addresses? It should be simple to use, fast to query and dynamic to store all street addresses of the world.

Edward Ross
  • 2,614
  • 1
  • 18
  • 17
Arsen Mkrtchyan
  • 49,896
  • 32
  • 148
  • 184
  • See also: http://stackoverflow.com/a/21023793/59087 – Dave Jarvis Jan 30 '14 at 11:38
  • 1
    You asked about street addresses, but all the answers are about postal addresses ([what's the difference?](http://painintheenglish.com/case/3604)). Perhaps the title should be changed? – wrygiel Oct 19 '16 at 06:34
  • This question fits "a practical, answerable problem that is unique to software development" so should be kept open. – Edward Ross Mar 19 '21 at 13:30

12 Answers12

142

It is possible to represent addresses from lots of different countries in a standard set of fields. The basic idea of a named access route (thoroughfare) which the named or numbered buildings are located on is fairly standard, except in China sometimes. Other near universal concepts include: naming the settlement (city/town/village), which can be generically referred to as a locality; naming the region and assigning an alphanumeric postcode. Note that postcodes, also known as zip codes, are purely numeric only in some countries. You will need lots of fields if you really want to be generic.

The Universal Postal Union (UPU) provides address data for lots of countries in a standard format. Note that the UPU format holds all addresses (down to the available field precision) for a whole country, it is therefore relational. If storing customer addresses, where only a small fraction of all possible addresses will be stored, its better to use a single table (or flat format) containing all fields and one address per row.

A reasonable format for storing addresses would be as follows:

  • Address Lines 1-4
  • Locality
  • Region
  • Postcode (or zipcode)
  • Country

Address lines 1-4 can hold components such as:

  • Building
  • Sub-Building
  • Premise number (house number)
  • Premise Range
  • Thoroughfare
  • Sub-Thoroughfare
  • Double-Dependent Locality
  • Sub-Locality

Frequently only 3 address lines are used, but this is often insufficient. It is of course possible to require more lines to represent all addresses in the official format, but commas can always be used as line separators, meaning the information can still be captured.

Usually analysis of the data would be performed by locality, region, postcode and country and these elements are fairly easy for users to understand when entering data. This is why these elements should be stored as separate fields. However, don't force users to supply postcode or region, they may not be used locally.

Locality can be unclear, particularly the distinction between map locality and postal-locality. The postal locality is the one deemed by a postal authority which may sometimes be a nearby large town. However, the postcode will usually resolve any problems or discrepancies there, to allow correct delivery even if the official post-locality is not used.

Edward Ross
  • 2,614
  • 1
  • 18
  • 17
  • 1
    Can you give a URL for the UPU? (Yeah, I know I could find it - but the best answers don't make people do the search.) – Jonathan Leffler May 31 '09 at 02:05
  • Try http://www.upu.int/post_code/en/postal_addressing_systems_member_countries.shtml and choose the appropriate country in the drop-down – barrowc May 31 '09 at 03:30
  • Added URL for UPU Post*Code product – Edward Ross Jun 02 '09 at 09:48
  • 21
    Also, some countries (Republic Of Ireland for example) do not use Zip codes. If I had a cent for the number of times I've had to entere na (not applicable) as a zip code because it's a required field man . . . I'd have five or six cent by now :) – Binary Worrier Jun 02 '09 at 10:08
  • if the UPU has downloadable lists, currently, they have done a good job of keeping them very well hidden. – James John McGuire 'Jahmic' Nov 09 '13 at 09:51
  • In this format, how do you deal with province/county, or state/county? – Pablo Fernandez Nov 29 '18 at 11:38
  • 1
    Of all the things I'd like to find on torrent trackers, a list of deliverable addresses on earth (unauthorizedly copied from respective post offices) is surprisingly missing. – William Entriken Jun 23 '19 at 20:54
  • Any chance of getting updated links to the relevant UPU pages above? Getting a 404 with the ones above and I couldn't locate the corresponding data on their website. – Vincent Aug 31 '20 at 15:22
  • 1
    @Vincent, not sure if you managed to find it. I did not immediately locate an up-to-date version of the page; however, it does appear to have been indexed by Wayback Machine: https://web.archive.org/web/20100515115432/http://www.upu.int/post_code/en/postal_addressing_systems_member_countries.shtml – Spencer D Oct 04 '20 at 21:12
54

Have a look at Database Answers. Specifically, this covers many cases:

(All variable length character datatype)

AddressId
Line1
Line2
Line3
City
ZipOrPostcode
StateProvinceCounty
CountryId
OtherAddressDetails

enter image description here

Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
  • I didn't downvote, but I think the only way this could work was if all fields but AddressId and Line1 were optional. In which case, it is not too useful. –  May 30 '09 at 22:50
  • 11
    Data types are important--not every country has integer ZIP codes! Had a coworker find this out quick with a customer in Canada. – Eric May 30 '09 at 22:53
  • 1
    @Eric: Other than Id fields all those fields are character datatypes – Mitch Wheat May 31 '09 at 01:28
  • 2
    For country ID, you should use the ISO 3166 2-letter (or 3-letter) country code. The proposed schema allows you to store an analyzed address; it does not tell you about how to format it. (Oh, and the UK has alphanumeric post codes - IP31 3GH, SE1W 9PQ, etc. I think the second group is always NAA; the first group starts with A and contains at least one N (A = alpha, N = digit), but nothing would surprise me.) – Jonathan Leffler May 31 '09 at 02:09
  • @Neil: Exactly. There's so much variation by country that you can't use a single table and expect the db to validate it. – Dave Sherohman Jun 02 '09 at 12:07
  • I can't seem to find that specific schema on databaseanswers.org, mind linking directly to it? – Alix Axel Sep 15 '11 at 13:54
  • Link is dead :( Maybe you meant to link to where katsuya is linking? – Stefan Falk Feb 29 '16 at 17:59
  • @JonathanLeffler I would recommend using the alpha2 ISO 3166, read first paragraph here: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 -- alpha_2 is used mostly for post stuff, whereas alpha3 is mostly used for passports and aviation standards. – LoopsGod Dec 14 '20 at 00:16
28

Ask yourself what is the main purpose of storing this data? Do you intend to actually send mail to the person at the address? Track demographics, populations? Be able to ask callers for their correct address as part of some basic authentication/verification? All of the above? None of the above?

Depending on your actual need, you will determine either a) it doesn't really matter, and you can go for a free-text approach, or b) structured/specific fields for all countries, or c) country specific architecture.

andora
  • 1,326
  • 1
  • 13
  • 23
13

Sometimes the closest you can get to a street address is the city.

I once had a project to put all the Secondary Schools in India in Google Maps. I wrote a spiffy program using the Google API and thought it would be quite easy.

Then I got the data from the client. Some school addresses were things like "Across from the market, next to the barber" or "Near old bus stand".

It made my task much harder since, unfortunately, the Google API does not support that format.

Emily
  • 9,926
  • 4
  • 31
  • 39
  • 3
    Asian addresses are notorious for this as well. "73rd Block West Ninjang St, Building 2, Take Second Upper Elevator, Office complex beside food court, 468th Industrial District, Shanghai 456789"... – ruhnet May 18 '18 at 15:37
10

For international addresses, it is remarkably hard to find a way to format the information if it is broken down into fields. As a for instance, an Italian address uses:

<street address>
<zip> <town> <region>
<country>

Such as

Via Eroi della Repubblica
89861 Tropea VV
Italy

That is rather different from the order for US addresses - on the second line.

See also the SO questions:

Also check out tag 'postal-code'.


Edit: Reverse order of region and town - per UPU

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
5

Maybe this is useful: https://gist.github.com/259744 For a project I collected a table of informations about all countries of the world, including ISO codes, top level domain, phone code, car sign, length and regex of zip. Country names and comments unfortunately only in German...

Hraban
  • 545
  • 5
  • 10
4

Differently of other answers here, I believe it's possible to have an structured address database.

Just out of the hat, I can think of the following structure:

  • Country
  • Region (State / Province)
  • Locality (City / Municipality)
  • Sub-Locality (County / other sub-division of a locality)
  • Street

But how to query it fast enough?

One way I always think it can be accomplished is to ask for the ZIP Code (or Postal Code) which varies from country to country, but is solid within the country.

This way you can structure your data around the information provided by the postal offices around the world.

Paulo Santos
  • 11,285
  • 4
  • 39
  • 65
2

No, there are no standard addressing scheme. It usually varies from country-to-country. Even the Universal Postal Union said on Adressing the world, an address for everyone that there is none. The best solution for this is to use the 2/3-letter country code standards known as ISO 3166 and treat everything else by country's standards.

However, if you really are desperate to use easily accessible tools for your project, you can try Google Place API.

Abel Callejo
  • 13,779
  • 10
  • 69
  • 84
2

No, absolutely not. If you compare the way US and Japanese addresses work, you'll see that it's not possible.

UPDATE:

On second thought, anything can be done, but there's a trade-off.

One approach is to model the problem with address and address_attribute tables, with a 1:m relationship between them, anything can be modeled. The address_attribute table would have a pk, a name, a value, and an fk that points back to its address parent's pk. It's almost like using a Map with name, value pairs.

The trade-off is having to do a JOIN every time you want an address. You also have to interrogate the names of the address_attributes to figure out what you're dealing with each time.

Another approach would be to do more comprehensive research on how addresses are modeled around the world. In an object-oriented world you might have the western Address class (street1/street2/city/state/zip) and others for Japan, China, as many as needed to tile the address space. Then you'd have a master Address table and child tables to the other types with a 1:1 relationship between them.

How does Amazon or eBay do it? They ship internationally. Do they have locale-specific UI features? I've only used the US locale.

duffymo
  • 305,152
  • 44
  • 369
  • 561
2

Depends on how free-form you are prepared to go with the fields. One free-form address field will obviously always do, but be of relatively little help narrowing down geography.

The problem you'll have is that there is too much variation in the level of geographical hierarchy across countries. Heck, some countries do not even have 'street addresses' everywhere.

I recommend you don't try to make it too clever.

jerryjvl
  • 19,723
  • 7
  • 40
  • 55
2

Len Silverston of Universal Data Model fame recommends a separate hierarchy of GEOGRAPHIC BOUNDARIES and depending on how much free-formed-ness you're willing to accept either simple STREET ADDRESS LINEs or per-country derivatives.

David Schmitt
  • 58,259
  • 26
  • 121
  • 165
  • 1
    True, and the models Silverston came up with are pretty good and cover a lot of ground but I still don't think such complexity is applicate to the web (at this point), specially from the end-user perspective. In the end, usuability (almost) always wins. – Alix Axel Oct 13 '11 at 09:07
1

Your design should strongly depend from your purpose. Some people have posted how to structure data. So if you simply want to send s-mail to someone, it will do. Things begin to complicate if you want to use this data for navigation. Car navigation will require additional structures to contain traffic info (eg one-way roads), while foot navigation will require a lot of additional data. Here is small example: in my city, my neighborhood is near the park. Next to the park is former airfield (in fact, one of the oldest in Europe) turned into aviation museum. Next to aviation museum is a business park. Street number for museum is 39, while business park numbers start with 39A. So it may seem that 39 and 39A are close – but it takes about a mile to walk from one to another (and even longer if going by car) .
This is just a small example taken from my city, I think you can probably find a lot of exceptions (especially in rural or wilder parts of every country).

smok1
  • 2,940
  • 26
  • 35