0

I want to convert county state name to coordinates.

county:

fips    state_fips  county_fips state   county
1000    1   0   Alabama Alabama
1005    1   5   Alabama Barbour County
1007    1   7   Alabama Bibb County
1009    1   9   Alabama Blount County
1011    1   11  Alabama Bullock County
6085    6   85  California  Santa Clara County
6089    6   89  California  Shasta County
6091    6   91  California  Sierra County
32021   32  21  Nevada  Mineral County
32023   32  23  Nevada  Nye County
32027   32  27  Nevada  Pershing County
32029   32  29  Nevada  Storey County

I want to use python geopy package.

from geopy.geocoders import Nominatim
import pandas as pd
import numpy as np
geolocator = Nominatim()
county['coord'] = county['county'].apply(geolocator.geocode)

I got the error:

socket.timeout: timed out
During handling of the above exception, another exception occurred:
urllib.error.URLError: <urlopen error timed out>
During handling of the above exception, another exception occurred:
geopy.exc.GeocoderTimedOut: Service timed out

How should I fix it? thanks

zilong
  • 65
  • 7
  • above code works for me, can you please sample of your data frame how it's structured. – jits_on_moon Jul 18 '18 at 03:40
  • fips state_fips county_fips state county 0 1000 1 0 Alabama Alabama 1 1001 1 1 Alabama Autauga County 2 1003 1 3 Alabama Baldwin County .It is same as that in the question. So strange. – zilong Jul 18 '18 at 03:46
  • i am getting output like below, 0 (Alabama, United States of America, (33.258881... 1 (Barbour County, Alabama, United States of Ame... 2 (Bibb County, Georgia, United States of Americ... Name: coord, dtype: object – jits_on_moon Jul 18 '18 at 03:49
  • My python3 is up to date. And I have tried to restart python and do it again. Have no idea why different computer could give different output. – zilong Jul 18 '18 at 03:53
  • The only reason which I can think of is, representation of your data frame object. – jits_on_moon Jul 18 '18 at 03:54
  • I copy the first 5 observations of the county dataset (which is large) to another dataframe for test. When I add .copy(), I don't see the error any more. However, I have a new error: geopy.exc.GeocoderTimedOut: Service timed out. Do you know the reason? – zilong Jul 18 '18 at 04:02
  • that means you reached a "time out" in the nominatim server. initialize Nominatim with a higher timeout (>1) try this: geolocator=Nominatim(timeout=3) – jits_on_moon Jul 18 '18 at 04:03
  • My real dataset county has several thousands of observations, how should I do it? – zilong Jul 18 '18 at 08:23
  • 1
    Nominatim Usage Policy [restricts the query rate to 1 rps](https://operations.osmfoundation.org/policies/nominatim/). If that's too low, you should use another service (or provision your own local instance of Nominatim). If that's enough, then simply add ```sleep(1)``` call between the queries. – KostyaEsmukov Jul 19 '18 at 08:46
  • This looks like a duplicate of https://stackoverflow.com/questions/38031705/http-error-429-too-many-requests-by-python-geopy – KostyaEsmukov Jul 19 '18 at 08:50
  • county['coord'] = county['county'].apply(geolocator.geocode) would do all the locations in the column "county". where should I add sleep(1) in the command? – zilong Jul 20 '18 at 08:07

1 Answers1

0
from time import sleep
from geopy.geocoders import Nominatim
import pandas as pd
import numpy as np

geolocator = Nominatim()

def geocode_with_sleep(query):
    sleep(1)
    return geolocator.geocode(query)

county['coord'] = county['county'].apply(geocode_with_sleep)

Another way is described in http://www.jackmaney.com/2015/01/09/geocoding-rate-limited-queue/

KostyaEsmukov
  • 848
  • 6
  • 11