1

This seems to be a common question, without any easily-digestible / easily-implementable answers. Many people reference the FCC API, but I don't know how to use an API and haven't found a simple explanation to help me in this situation. R code I can do, Python I can do (if it's simple), but it really seems like there should be some relatively simple resource for taking a .csv (or similar) with lat/long columns, and getting FIPS codes back (at the block group level, from the 2010 census).

Potential solutions (and my issues with them):

  • This github I believe queries the old FCC API, which is decommissioned. Either way, when I run it on the example given it throws the error Error in fromJSON(content, handler, default.size, depth, allowComments, : invalid JSON input. Furthermore, I wonder how it will do if mapped over 16 million coordinates
  • This SO question works great on a few rows, and I've implemented it for cases where I only need a couple thousand queries, but I've gotten the error Error in curl::curl_fetch_memory(url, handle = handle) : Timeout was reached: Send failure: Connection was reset and Error in call_geolocator_latlon(row["GE_LATITUDE_2010"], row["GE_LONGITUDE_2010"]) : Service Unavailable (HTTP 503), which I assume are due to my data being too big.
  • The solution here doesn't seem like it would be best at first glance to me, since it involves downloading shapefiles which just seems inefficient, but since I actually only have observations in CA it should work, except that when I change it to give me 2010 block group geographies, it breaks:
    • ca <- tidycensus::get_decennial(state = "CA", geography = "block group", variables = "B00001_001", geometry = TRUE, year = 2010)

Ideally, I'd like to find/write a function that allows me to input my the name of my dataframe and the columns that have my latitude and longitude data in them, and that then adds a column with the FIPS code (at the block group level, from the 2010 census) Alternatively, somewhere I can just upload a .csv and get a .csv back would be great. Or a python package that is easily implementable by someone with very limited python knowledge. Etc, etc, etc.

sample dataframe (for R):

testdata <- structure(list(unique_id = c(5392085L, 14789082L, 11023930L, 4005454L, 13701322L, 10821557L, 11397828L, 15709999L, 475895L, 1546307L), GE_LATITUDE_2010 = c(38.272084, 33.013099, 39.019289, 33.992753, 32.6104, 33.717793, 34.550265, 32.842897, 33.754883, 38.461337), GE_LONGITUDE_2010 = c(-122.644619, -117.05967, -121.006352, -118.26259, -117.057227, -118.044996, -117.277502, -116.890541, -116.983093, -121.389269)), row.names = c(NA, -10L), class = "data.frame")
cskn
  • 99
  • 9
  • 1
    why do you think downloading shapefiles would be inefficient; have you tried it? – SymbolixAU Feb 18 '20 at 03:12
  • 2
    Downloading a shapefile of block groups in California might take a little bit, but nowhere near as long as geocoding 16 million rows. Plus it's free. It's helpful if you can be more specific than "tidycensus breaks"; luckily I work with Census data, and can notice right away that's an ACS variable number where you want a decennial one. If you don't actually need to census data, just the shapefile, just download it from the Census TIGER site (or use `tigris`, which `tidycensus` calls) – camille Feb 18 '20 at 04:01
  • 1
    About the actual calculation: make a spatial object from your coordinates. I like `sf` for this. Take the shapefile of block groups (from the Census Bureau) and do a spatial overlay. If you no longer need the spatial data, just the ID, coords, and BG FIPS, drop the rest – camille Feb 18 '20 at 04:04

1 Answers1

1

if I understand your question correctly, you have lat and lon data and you want the FIPS codes associated with the coordinates.

to do that with Python you can do the following:

your sample df:

unique_id=['5392085L', '14789082L', '11023930L', '4005454L', '13701322L', '10821557L', 
'11397828L', '15709999L', '475895L', '1546307L']
GE_LATITUDE_2010=[38.272084, 33.013099, 39.019289, 33.992753, 32.6104, 33.717793, 
34.550265, 32.842897, 33.754883, 38.461337]
GE_LONGITUDE_2010=[-122.644619, -117.05967, -121.006352, -118.26259, -117.057227, 
-118.044996, -117.277502, -116.890541, -116.983093, -121.389269]


df=pd.DataFrame()


df['unique_id'] = unique_id
df['GE_LATITUDE_2010'] = GE_LATITUDE_2010
df['GE_LONGITUDE_2010'] = GE_LONGITUDE_2010

df


import urllib, json, requests
import pandas as pd
def get_fips_num(df):
    df_1=df[['GE_LONGITUDE_2010','GE_LATITUDE_2010','unique_id']]
    fips_lst=[]
    unique_id=[]
    for i,e,o in df_1.itertuples(index=False):
        try:
            lo=i
            la=e
            ven=o
            link='https://geo.fcc.gov/api/census/area?lat={0}&lon={1}&format=json'.format(la,lo)
            reponse_1 = requests.get(link).json()

            x=reponse_1['results'][0]['block_fips']
            #print(x)
            if len(x) != 0:
                fips_lst.append(x)
                unique_id.append(o)

        except Exception as error:
            print("error type: /" +str(error))

    df_result = pd.DataFrame()
    df_result['unique_id'] =unique_id
    df_result['fips'] = fips_lst 
    return df_result


    df_1=df[['GE_LONGITUDE_2010','GE_LATITUDE_2010','unique_id']]

when you run the code on your df you should get the below df:

    get_fips_num(df)

[enter image description here][1]


  [1]: https://i.stack.imgur.com/dERnA.png