12

I'm using GeoPy to geocode addresses to lat,lng. I would also like to extract the itemized address components (street, city, state, zip) for each address.

GeoPy returns a string with the address -- but I can't find a reliable way to separate each component. For example:

123 Main Street, Los Angeles, CA 90034, USA =>
{street: '123 Main Street', city: 'Los Angeles', state: 'CA', zip: 90034, country: 'USA'}

The Google geocoding API does return these individual components... is there a way to get these from GeoPy? (or a different geocoding tool?)

Kara
  • 6,115
  • 16
  • 50
  • 57
lubar
  • 2,589
  • 2
  • 26
  • 28

4 Answers4

28

You can also get the individual address components from the Nominatim() geocoder (which is the standard open source geocoder from geopy).

from geopy.geocoders import Nominatim

# address is a String e.g. 'Berlin, Germany'
# addressdetails=True does the magic and gives you also the details
location = geolocator.geocode(address, addressdetails=True)

print(location.raw)

gives

{'type': 'house',
 'class': 'place',
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright',
 'display_name': '2, Stralauer Allee, Fhain, Friedrichshain-Kreuzberg, Berlin, 10245, Deutschland',
 'place_id': '35120946',
 'osm_id': '2825035484',
 'lon': '13.4489063',
 'osm_type': 'node',
 'address': {'country_code': 'de',
             'road': 'Stralauer Allee',
             'postcode': '10245',
             'house_number': '2',
             'state': 'Berlin',
             'country': 'Deutschland',
             'suburb': 'Fhain',
             'city_district': 'Friedrichshain-Kreuzberg'},
 'lat': '52.5018003',
 'importance': 0.421,
 'boundingbox': ['52.5017503', '52.5018503', '13.4488563', '13.4489563']}

with

location.raw['address']

you get the dictionary with the components only.

Take a look at geopy documentation for more parameters or Nominatim for all address components.

Skippy le Grand Gourou
  • 6,976
  • 4
  • 60
  • 76
Kroenig
  • 644
  • 7
  • 16
  • 3
    Missing this line from the code that confused me: ```geolocator = Nominatim(user_agent="specify_your_app_name_here")``` – Justin Furuness Feb 24 '21 at 04:09
  • Apparently the user agent string doesn't matter, but you need to have it to use Nomatim without violating their TOS according to their docs. – Justin Furuness Feb 24 '21 at 04:10
5

Use usaddress by DataMade. Here's the GitHub repo.

It works like this usaddress.parse('123 Main St. Suite 100 Chicago, IL') and returns this array

[('123', 'AddressNumber'), ('Main', 'StreetName'), ('St.', 'StreetNamePostType'), ('Suite', 'OccupancyType'), ('100', 'OccupancyIdentifier'), ('Chicago,', 'PlaceName'), ('IL', 'StateName')]

stevevance
  • 381
  • 4
  • 10
2

This is how I implemented such a split, as I wanted the resulting address in always the same format. You would just have to skip the concatenation and retrun each value... or put it in list. Up to you.

 def getaddress(self, lat, lng, language="en"):
        try:
            geolocator = Nominatim()
            string = str(lat) + ', ' +str(lng)
            location = geolocator.reverse(string, language=language)
            data = location.raw
            data = data['address']
            address = str(data)

            street = district = postalCode= state = country = countryCode = ""

            district    =str(data['city_district'])
            postalCode  =str(data['postcode'])
            state       =str(data['state'])
            country     =str(data['country'])
            countryCode =str(data['country_code']).upper()
            address = street +' '+ district  +' '+  postalCode  +' '+  state  +' '+  country  +' '+  countryCode
        except:
            address="Error"
        return str(address.decode('utf8'))
1

I helped write one not long ago called LiveAddress; it was just upgraded to support single-line (freeform) addresses and implements geocoding features.

GeoPy is a geocoding utility, not an address parser/standardizer. LiveAddress API is, however, and can also verify the validity of the address for you, filling out the missing information. You'll find that services such as Google and Yahoo approximate the address, while a CASS-Certified service like LiveAddress actually verify it and won't return results unless the address is real.

After doing a lot of research and development with implementing LiveAddress, I wrote a summary in this Stack Overflow post. It documents some of the crazy-yet-complete formats that addresses can come in and ultimately lead to a solution for the parsing problem (for US addresses).

To parse a single-line address into components using Python, simply put the entire address into the "street" field:

import json
import pprint
import urllib

LOCATION = 'https://api.qualifiedaddress.com/street-address/'
QUERY_STRING = urllib.urlencode({ # entire query sting must be URL-Encoded
    'auth-token': r'YOUR_API_KEY_HERE',
    'street': '1 infinite loop cupertino ca 95014'
})
URL = LOCATION + '?' + QUERY_STRING

response = urllib.urlopen(URL).read()
structure = json.loads(response)
pprint.pprint(structure)

The resulting JSON object will contain a components object which will look something like this:

"components": {
        "primary_number": "1",
        "street_name": "Infinite",
        "street_suffix": "Loop",
        "city_name": "Cupertino",
        "state_abbreviation": "CA",
        "zipcode": "95014",
        "plus4_code": "2083",
        "delivery_point": "01",
        "delivery_point_check_digit": "7"
}

The response will also include the combined first_line and delivery_line_2 so you don't have to manually concatenate those if you need them. Latitude/longitude and other information is also available about the address.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Matt
  • 22,721
  • 17
  • 71
  • 112
  • Thanks Matt -- this is very helpful. I tried out LiveAddress on the website and it looks good and may be the solution for my app. However, my original question of how to get the components out of GeoPy still stands -- any ideas? – lubar Jul 09 '12 at 18:18
  • Sure. It's tempting to split on commas, but that will yield unreliable/inconsistent results because the geocoding services with which GeoPy integrates format their results differently; and addresses, by their nature, vary considerably. It looks like GeoPy uses the deprecated Google Maps v2 API, which returns components in the `AddressDetails` field. I wonder if you could change line 147 of `google.py` to read from that field instead, but you'd probably have to accomodate to read an object, not a single string... – Matt Jul 09 '12 at 19:23