0

I have 500k+ geographical points with its latitude and longitude. I use the below function to determine their respective countries.

 def findCountry(lat, lon):
    data = json.load(urllib2.urlopen('http://maps.googleapis.com/maps/api/geocode/json?latlng=%s,%s&sensor=false' % (lat, lon)))
    for result in data['results']:
        for component in result['address_components']:
             if 'country' in component['types']:
                  return component['long_name']
    return None

The function call to findCountry() is like below:

df3['country'] = df3.apply(lambda row: lookup(row['StartLat'],row['StartLong']), axis = 1)

But for 500K+ points, its taking an infinitely long time to complete. Just wondering whether i can optimise this function or use some other in-built function to get it done quickly.

Any help would be appreciated.

user3447653
  • 3,968
  • 12
  • 58
  • 100

1 Answers1

0

You might very well already know about geocoders (see http://geopy.readthedocs.io/en/latest/#module-geopy.geocoders). It provides uniform access to a variety of services related to this. The documentation (https://media.readthedocs.org/pdf/geopy/latest/geopy.pdf) lists them at page 47. I have no idea which, if any might be faster than what you're using. However, given the size of your task they might be worth investigating. This code is intended to provide a first cut at what's on offer.

import geopy.geocoders
import inspect
import string

serviceNames = [_[0] for _ in inspect.getmembers(geopy.geocoders) \
    if not _[0].startswith('__')
    and  _[0][0] in string.ascii_uppercase
    ]
print (serviceNames)

for serviceName in serviceNames:
    try:
        exec('geolocator = geopy.geocoders.%s()' % serviceName)
    except:
        print (serviceName, 'requires access code, or not intended for this purpose')
        continue
    try:
        result = geolocator.reverse('43.2,65.4')
        print (serviceName, str(result).encode('ascii', 'ignore'))
    except:
        print (serviceName, 'reverse unsupported?')
        continue

My assumption is that only the members beginning with capitals represent services. Here's the output.

['ArcGIS', 'Baidu', 'Bing', 'DataBC', 'GeoNames', 'GeocodeFarm', 'GeocoderDotUS', 'GeocoderNotFound', 'GoogleV3', 'IGNFrance', 'LiveAddress', 'NaviData', 'Nominatim', 'OpenCage', 'OpenMapQuest', 'Photon', 'SERVICE_TO_GEOCODER', 'What3Words', 'YahooPlaceFinder', 'Yandex']
ArcGIS reverse unsupported?
Baidu requires access code, or not intended for this purpose
Bing requires access code, or not intended for this purpose
DataBC reverse unsupported?
GeoNames requires access code, or not intended for this purpose
GeocodeFarm reverse unsupported?
GeocoderDotUS reverse unsupported?
GeocoderNotFound reverse unsupported?
GoogleV3 b'[Location(Tamdy District, Uzbekistan, (42.54183, 65.2488422, 0.0)), Location(Navoiy Province, Uzbekistan, (42.6988575, 64.6337685, 0.0)), Location(Uzbekistan, (41.377491, 64.585262, 0.0))]'
IGNFrance requires access code, or not intended for this purpose
LiveAddress requires access code, or not intended for this purpose
NaviData reverse unsupported?
Nominatim b'Tomdi Tumani, Navoiy Viloyati, Ozbekiston'
OpenCage requires access code, or not intended for this purpose
OpenMapQuest reverse unsupported?
Photon reverse unsupported?
SERVICE_TO_GEOCODER requires access code, or not intended for this purpose
What3Words requires access code, or not intended for this purpose
YahooPlaceFinder requires access code, or not intended for this purpose
Yandex b'[Location( , , (42.052267, 65.243642, 0.0)), Location(, (42.004874, 64.330882, 0.0)), Location(None, (41.765066, 63.150118, 0.0))]'

GoogleV3 and Nominatim do what you seem to want without further carry-on. Where the results say 'requires access code, or not intended for this purpose' usually that means you need a key or login.

Bill Bell
  • 21,021
  • 5
  • 43
  • 58