1

I am developing an address matching application using Google geocoding API. The problem is that some of the addresses in the database I am trying to validate are something like:

ATTN: Mr. THOMAS WONG 2457 Yonge St., Toronto, ON, N2S 2V5, Canada

rather than

2457 Yonge St., Toronto, ON, N2S 2V5, Canada

The first string returns null results (because it starts with a person's name), the second one will validate and return a full correct address.

My question is: What would be the right approach to this issue? I am thinking of a way to extract only the relevant part from the address string (with some function) but maybe there are better ideas?

Thank you, M.R.

Matt
  • 22,721
  • 17
  • 71
  • 112

2 Answers2

1

If the desired part of the address always starts with a number, try this:

  1. find the first digit in the string.
  2. get a substring from the first digit to the end of the string.
  3. you now have the address.

In order to parse addresses, you need to know all possible formats.

Do you need to include:

  • Santa, North Pole.
  • The Queen, Great Britian
  • Captian Hootberry
  • Bob Goldenberry, rural route 7, MN
  • Jackie Blam, P.O. Box 78, Hootville, OH

For a comprehensive address parsing solution, you will need to provide several algorithms for different address formats then determine which algorithm to use based on the input.

DwB
  • 37,124
  • 11
  • 56
  • 82
  • 1
    This will work for some of the addresses but not for all, for example I may have: LISA ANDREW P.O. BOX 55, Kingston, ON, H5F 3C9, ON, Canada, etc. I was thinking there may be some address validation application that can solve the problem... –  Jun 13 '13 at 17:06
  • This approach would work but makes some strong assumptions about the input. Could you expand it to include more scenarios, and handle the common edge cases? – Matt Jun 13 '13 at 18:57
  • This approach makes no assumptions. read the first line "if the desired part of the address always starts with a number". PO Box 55 is outside of this solution. If you need more address parsing, show all your possible input formats. At the time this answer was written, it covered all listed input formats. – DwB Jun 13 '13 at 19:04
  • Thank you everybody for your help. Unfortunately I cannot come with all scenarios, I checked the database and there are approx. 50,000 addresses that fail to validate even though there is a valid address contained in the string. I am looking now at Matt solution, it may be helpful. –  Jun 13 '13 at 20:28
1

I work at SmartyStreets and wrote the address extractor which we now offer with LiveAddress API. It's hard. There are a lot of assumptions you need to force yourself not to make, including "if the address starts with a number." (Sorry DwB -- there's a lot to consider.)

If you have US addresses, you may still find our tool useful (it's free to sign up and use, to a point). Here's another Stack Overflow post about the extraction utility: https://stackoverflow.com/a/16448034/1048862

The best way to do this would be to use an address validation service -- one that can validate delivery points and not just address ranges (which is most common, so be wary of claims to "address validation" when it's really just guessing within certain bounds).

Be aware, too, that Google does not validate addresses. It may standardize them, and will return results where the address would exist if it were real, and if it is actually valid, it's your lucky day.

Community
  • 1
  • 1
Matt
  • 22,721
  • 17
  • 71
  • 112
  • Very interesting Matt, seems to be exactly what I am looking for! I shall try to see if it works with Canadian addresses too (our database contains mainly Canadian customers). –  Jun 13 '13 at 20:38
  • I wish I could tell you it worked for Canada. If you need a system to extract Canadian addresses, look at an API provided by CanadaPost perhaps... you'll have to follow some logic that DwB suggested, but be sure to account for pretty much all common variations of Canadian addresses. – Matt Jun 13 '13 at 20:54
  • @user441637 SmartyStreets just started offering international address verification – camiblanch Aug 12 '15 at 16:58