8

I have a dataset with longitude and latitude coordinates. I want to retrieve the corresponding census tract. Is there a dataset or api that would allow me to do this?

My dataset looks like this:

       lat       lon   
1 40.61847 -74.02123   
2 40.71348 -73.96551   
3 40.69948 -73.96104    
4 40.70377 -73.93116   
5 40.67859 -73.99049   
6 40.71234 -73.92416   

I want to add a column with the corresponding census tract.

Final output should look something like this (these are not the right numbers, just an example).

       lat       lon     Census_Tract_Label   
1 40.61847 -74.02123                   5.01
2 40.71348 -73.96551                     20
3 40.69948 -73.96104                     41
4 40.70377 -73.93116                  52.02
5 40.67859 -73.99049                     58
6 40.71234 -73.92416                     60
nak5120
  • 4,089
  • 4
  • 35
  • 94

1 Answers1

12

The tigris package includes a function called call_geolocator_latlon that should do what you're looking for. Here is some code using

    > coord <- data.frame(lat = c(40.61847, 40.71348, 40.69948, 40.70377, 40.67859, 40.71234),
    +                     long = c(-74.02123, -73.96551, -73.96104, -73.93116, -73.99049, -73.92416))
    > 
    > coord$census_code <- apply(coord, 1, function(row) call_geolocator_latlon(row['lat'], row['long']))
    > coord
           lat      long     census_code
    1 40.61847 -74.02123 360470152003001
    2 40.71348 -73.96551 360470551001009
    3 40.69948 -73.96104 360470537002011
    4 40.70377 -73.93116 360470425003000
    5 40.67859 -73.99049 360470077001000
    6 40.71234 -73.92416 360470449004075

As I understand it, the 15 digit code is several codes put together (the first two being the state, next three the county, and the following six the tract). To get just the census tract code I'd just use the substr function to pull out those six digits.

    > coord$census_tract <- substr(coord$census_code, 6, 1)
    > coord
           lat      long     census_code census_tract
    1 40.61847 -74.02123 360470152003001       015200
    2 40.71348 -73.96551 360470551001009       055100
    3 40.69948 -73.96104 360470537002011       053700
    4 40.70377 -73.93116 360470425003000       042500
    5 40.67859 -73.99049 360470077001000       007700
    6 40.71234 -73.92416 360470449004075       044900

I hope that helps!

Danny Farnand
  • 136
  • 1
  • 3
  • Is there any way to vectorize `call_geolocator_latlon`? I'd like to do this but for a relatively large number (200,000) of coordinates. – mlinegar Sep 09 '18 at 20:07
  • 1
    This specific function looks like it only does single api calls at a time. The [API Documentation](https://geocoding.geo.census.gov/geocoder/Geocoding_Services_API.pdf) from the Census mentions batch geocoding by sending a specially formatted csv. the example curl command they give is: `curl --form addressFile=@localfile.csv --form benchmark=9 https://geocoding.geo.census.gov/geocoder/locations/addressbatch --output geocoderesult.csv` – Danny Farnand Sep 12 '18 at 15:19
  • 1
    This is very helpful! If I wanted to specify a specific vintage, what modification would I need to make? I've tried `coord$census_code <- apply(coord, 1, function(row) call_geolocator_latlon(row['lat'], row['long'], vintage = 2010))` and `coord$census_code <- apply(coord, 1, function(row) call_geolocator_latlon(row['lat'], row['long'], rep(na, nrow(coord)), rep(2010, nrow(coord)))`, as well as adding `benchmark` and `vintage` columns and then doing `coord$census_code <- apply(coord, 1, function(row) call_geolocator_latlon(row['lat'], row['long'], row['benchmark'], row['vintage'])` – cskn Feb 12 '20 at 23:27