1

I am using Python 3 and the Google Geocoding API to gather address information. I am having a difficult time consistently parsing out the address components that I am getting back. The JSON structure for the 'address_components' section does not seem to follow standard json formatting, and I can't find any mention of how to pop values out of a list by their names, Maybe I need to force the 'address_components' list sections to behave like dictionaries? Specifically, my problem occurs in situations similar to below:

import urllib.request
import urllib.parse
import json

url = "http://maps.googleapis.com/maps/api/geocode/json?" + urllib.parse.urlencode({"sensor": "false", "output": "more", "address": "The White House"})
uo = urllib.request.urlopen(url)
data = uo.read().decode()
js = json.loads(str(data))
print('Full js:', js)
# which nets good json text as inserted in-line below:

Full js: {'results': [{'formatted_address': 'The White House, 1600 Pennsylvania Ave NW, Washington, DC 20500, USA', 'geometry': {'location': {'lng': -77.0365298, 'lat': 38.8976763}, 'viewport': {'southwest': {'lng': -77.0378787802915, 'lat': 38.8963273197085}, 'northeast': {'lng': -77.0351808197085, 'lat': 38.8990252802915}}, 'location_type': 'APPROXIMATE'}, 'address_components': [{'long_name': 'The White House', 'types': ['point_of_interest', 'establishment'], 'short_name': 'The White House'}, {'long_name': '1600', 'types': ['street_number'], 'short_name': '1600'}, {'long_name': 'Pennsylvania Avenue Northwest', 'types': ['route'], 'short_name': 'Pennsylvania Ave NW'}, {'long_name': 'Northwest Washington', 'types': ['neighborhood', 'political'], 'short_name': 'Northwest Washington'}, {'long_name': 'Washington', 'types': ['locality', 'political'], 'short_name': 'Washington'}, {'long_name': 'District of Columbia', 'types': ['administrative_area_level_1', 'political'], 'short_name': 'DC'}, {'long_name': 'United States', 'types': ['country', 'political'], 'short_name': 'US'}, {'long_name': '20500', 'types': ['postal_code'], 'short_name': '20500'}], 'partial_match': True, 'place_id': 'ChIJ37HL3ry3t4kRv3YLbdhpWXE', 'types': ['point_of_interest', 'establishment']}], 'status': 'OK'}

# Thanks to the json library (I believe), I can pull data out of the above as if it were a mixed list & dictionary by using named references and indexed locations:
rte = js['results'][0]['address_components'][2]
print("Route:", rte)

Route: {'long_name': 'Pennsylvania Avenue Northwest', 'types': ['route'], 'short_name': 'Pennsylvania Ave NW'}

# unfortunately though, the lists change structure (after the initial 'results' category) and so the route is not always the third element, as seen below:
url2 = "http://maps.googleapis.com/maps/api/geocode/json?" + urllib.parse.urlencode({"sensor": "false", "output": "more", "address": "The Breakers"})
uo2 = urllib.request.urlopen(url2)
data2 = uo2.read().decode()
js2 = json.loads(str(data2))
address_components2 = js2['results'][0]['address_components'][2]
print("address_components2:", address_components2)

address_components2: {'long_name': 'Suffolk County', 'types': ['administrative_area_level_2', 'political'], 'short_name': 'Suffolk County'}

Is there any way around this issue? How can I get Route always for 'route'?

leerssej
  • 14,260
  • 6
  • 48
  • 57

1 Answers1

2

As discussed and shown in Python example in Processing JSON with Javascript, json.load() was used in parsing then results were displayed in formatted_address values to the user within an array. And, to get only the data you need, you can use filter() function from the Python Built-in Functions whenever the JSON response contains multiple values.

Helpful explanation regarding processing JSON responses can also be found in this related SO post - How to reverse geocode serverside with python, json and google maps?.

Community
  • 1
  • 1
Teyam
  • 7,686
  • 3
  • 15
  • 22
  • `formatted_address` is a different section of the json returned and definitely not what we want here. Unfortunately, `formatted_address` is the undifferentiated concatenate of all the components: it loses all the dictionary like characterizations of its parsed components, which are found in the `address_components` section of the returned json. Specifically, for example, is there any way we can get a single feature named *'route'* out of the json like dictionary out of every one of the json results returned? – leerssej Jun 06 '16 at 03:30
  • The `'route'` component always has the same name, but its appearance in the order of the json results returned moves around. Unfortunately, my python interpreter claims that address components section of the results is a list. It furthermore claims that I can only request parts out of list by reference to integer positions. Is this true? How could I get around this requirement and be able to access these labelled positions with dictionary like named 'position' requests? Is there some way to adjust the google api results to avoid the issue entirely perhaps? – leerssej Jun 06 '16 at 03:38