0

I have two python lists

    li = ['206 Brookwood Center Drive Suite 508, WMP, Birmingham, AL 35111',
      '340 Independence Drive, Homewood, AL 35209',
      '41 Doell Drive Southeast, Huntsville, AL 35801',
      '3 Mobile Circle, Suite 401, Mobile, AL 36607',
      '7209 Copperfield Drive, Montgomery, AL 36117']

mi = ['340 Independence Dr Homewood, AL 35209',
      '41 Doell Dr SE, Ste 24 Huntsville, AL 35801',
      '3 Mobile Cir, Ste 401 Mobile, AL 36607',
      '36 Saint Lukes Dr Montgomery, AL 36117',
      '91 Kanis Rd, Ste 300 Little Rock, AR 72205',
      '25 S Dobson Rd, Bldg J Chandler, AZ 85224']

I want to loop through li and see if a record does not exist in mi using some kind of partial text match, I tried startswith, in but because of differences like "Dr - Drive", "Suite-ste" this fails. Any suggestions? Would some kind of python regex work? The output should be '206 Brookwood Center Drive Suite 508, WMP, Birmingham, AL 35111' and 7209 Copperfield Drive, Montgomery, AL 36117

Ronron
  • 69
  • 1
  • 2
  • 11
  • 1
    I would search for a library that normalizes street addresses (related [post](https://stackoverflow.com/questions/4838268/normalizing-street-addresses-in-django-python)). – jarmod Jan 07 '22 at 17:00
  • see https://stackoverflow.com/questions/4838268/normalizing-street-addresses-in-django-python – Stuart Jan 07 '22 at 17:01

1 Answers1

0

If you are doing this for fun, remember that addresses are read from the bottom up because getting the letter to at least the right building is the biggest step.

  1. city, state zip is the most important.
  2. street address is the second most important, along with apt#
  3. addressee is the last important item

The following two methods have a significant advantage over anything you might do with any kind of alias list for abbreviations, etc. That advantage is that they are based upon a database of all deliverable addresses against which to compare the "standardized" address:

If you are doing a one-off project and confidentiality is not an issue, you can use the U.S. Post Office website for zip code lookup. It will return the standardized address as well. You can automate its use to some extent.

If you are going to do anywhere over 1,000 addresses on a recurring basis, get an address standardization software package, usually in the form of mailing software. $600US/year upwards.

betacrash
  • 59
  • 7