3

There are several R packages that make working with US Census data easier. The two I use most frequently are tigris(for loading the spatial data) and acs (for loading the tabular data).

However, one problem I keep running into is that I can't figure out an efficient, reliable way to determine all of the tracts (or block groups, zip codes, etc.) within a Place without leaving the R console.

For instance, if I wanted to work with census block data in Seattle I would begin by using tigris::tracts to download the spatial data for King County, WA:

library(tigris)
tr <- tigris::tracts(state = "WA", county = "King")

But unfortunately there's no obvious way to subset this data to include only the tracts in Seattle.

glimpse(tr)
Observations: 398
Variables: 12
$ STATEFP  (chr) "53", "53", "53", "53", "53", "53", "53", ...
$ COUNTYFP (chr) "033", "033", "033", "033", "033", "033", ...
$ TRACTCE  (chr) "003800", "021500", "032704", "026200", "0...
$ GEOID    (chr) "53033003800", "53033021500", "53033032704...
$ NAME     (chr) "38", "215", "327.04", "262", "327.03", "3...
$ NAMELSAD (chr) "Census Tract 38", "Census Tract 215", "Ce...
$ MTFCC    (chr) "G5020", "G5020", "G5020", "G5020", "G5020...
$ FUNCSTAT (chr) "S", "S", "S", "S", "S", "S", "S", "S", "S...
$ ALAND    (dbl) 624606, 3485578, 17160645, 15242622, 10319...
$ AWATER   (dbl) 0, 412526, 447367, 526886, 175464, 0, 4360...
$ INTPTLAT (chr) "+47.6794093", "+47.7643848", "+47.4940877...
$ INTPTLON (chr) "-122.2955292", "-122.2737863", "-121.7717...

Similarly, the acs package allows users to create subsets of census data using the geo.make function, but in my example this won't help me if I don't already have the list of tracts GEOIDs for all of the Seattle tracts.

For the record, I am aware that it is possible to determine this information elsewhere. This page in the Census.gov FAQ gives clear instructions on how to determine all the tracts in a given census Place. But given that this is a crucial step in many census-related analyses, it would be best if there was a convenient way to do it from the R console.

Thanks in advance.

Edit

Although this question deals with spatial data, I am most interested in finding a non-spatial solution. For instance, I would prefer to a solution that queries the Census API and returns the returns a vector of the desired GEOIDs to a solution that employs a spatial analysis tool (e.g., rgeos::intersects) to create the vector. Why? Because spatial approaches are simply more prone to error in this process and this is known information we're talking about, not something that needs to be inferred spatially.

Community
  • 1
  • 1
Tiernan
  • 828
  • 8
  • 20
  • I don't think so. Cities don't fit within the census [geographic hierarchy](https://www.census.gov/geo/reference/webatlas/). If you can do it outside R, why don't **you** develop the method to do so inside R? Others might also find this helpful – alexwhitworth Mar 30 '16 at 17:34
  • @Alex Within the Census hierarchy, cities fall into the 'Places' category. The way to get this information outside of `R` (see the link I provided above) requires interacting with the [American FactFinder interface](http://factfinder.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=t). That seems like it would be a very tricky process to write into an `R` function, but if you have thoughts I'd be happy to hear them. – Tiernan Mar 30 '16 at 17:47
  • Gotcha. Still, the overall point stands-- tracts are a lower level of the hierarchy from States; so, there's no mapping between tracts and places.... to your thoughts on writing the code: unfortunately, it's not a problem I'm interested in (not intentionally being rude), so I do not have thoughts on it. – alexwhitworth Mar 30 '16 at 18:00
  • @Alex right - not possible using the hierarchy GEOID codes. But as the [description](https://ask.census.gov/faq.php?id=5000&faqId=1605) shows, this is known information we're talking about. So while we can't extract it using the Hierarchy of Census Geographic Entities, it is still information that should be accessible somehow (the Census API seems like the best bet to me). – Tiernan Mar 30 '16 at 18:17

3 Answers3

3

I often need the same kind of data so I wrote a R package to do this job. This package is called totalcensus. You can find it here https://github.com/GL-Li/totalcensus.

With this package, you can get data at tract, block group, or block level of towns, cities, counties, metro areas and all other geographic areas very easily. For example if you want to get the race data at block group level of various areas from 2011-2015 ACS 5-year survey, simply run code like below:

mixed <- read_acs5year(
    year = 2015,
    states = c("ut", "ri"),
    table_contents = c(
        "white = B02001_002",
        "black = B02001_003",
        "asian = B02001_005"
    ),
    areas = c(
        "Lincoln town, RI",
        "Salt Lake City city, UT",
        "Salt Lake City metro",
        "Kent county, RI",
        "COUNTY = UT001",
        "PLACE = UT62360"
    ),
    summary_level = "block group"
)

It returns data like:

#                      area               GEOID        lon      lat state population white black asian GEOCOMP SUMLEV                                                             NAME
#    1:    Lincoln town, RI 15000US440070115001  -71.46686 41.94419    RI       1561  1386   128    47     all    150 Block Group 1, Census Tract 115, Providence County, Rhode Island
#    2:    Lincoln town, RI 15000US440070115002  -71.47159 41.96754    RI        916   806    97     0     all    150 Block Group 2, Census Tract 115, Providence County, Rhode Island
#    3:    Lincoln town, RI 15000US440070115003  -71.47820 41.96364    RI       2622  2373    77    86     all    150 Block Group 3, Census Tract 115, Providence County, Rhode Island
#    4:    Lincoln town, RI 15000US440070115004  -71.47830 41.97346    RI       1605  1516    43     0     all    150 Block Group 4, Census Tract 115, Providence County, Rhode Island
#    5:    Lincoln town, RI 15000US440070116001  -71.44665 41.93120    RI        948   764     0     0     all    150 Block Group 1, Census Tract 116, Providence County, Rhode Island
# ---                                                                                                                                                                               
# 1129: Providence city, UT 15000US490050012011 -111.82424 41.69198    UT       2018  1877     0     0     all    150            Block Group 1, Census Tract 12.01, Cache County, Utah
# 1130: Providence city, UT 15000US490050012012 -111.80736 41.69323    UT       1486  1471     0     0     all    150            Block Group 2, Census Tract 12.01, Cache County, Utah
# 1131: Providence city, UT 15000US490050012013 -111.81310 41.65837    UT       1563  1440    15     0     all    150            Block Group 3, Census Tract 12.01, Cache County, Utah
# 1132: Providence city, UT 15000US490050012022 -111.85231 41.68674    UT       3894  3594     0     0     all    150            Block Group 2, Census Tract 12.02, Cache County, Utah
# 1133: Providence city, UT 15000US490059801001 -111.64525 41.67498    UT        118   118     0     0     all    150             Block Group 1, Census Tract 9801, Cache County, Utah
GL_Li
  • 1,758
  • 1
  • 11
  • 25
  • Looks like a fire package! I don't have 200GB spare memory on my machine but once I do I'll be sure to take it for a spin. – Tiernan Dec 04 '17 at 16:15
  • The above example use 2015 ACS 5-year survey data, which is about 50GB. You do not need to download Census 2010 data if you don't really use it. – GL_Li Dec 04 '17 at 17:32
  • Ah - that's good to know. 50GB will still exceed the memory of my cloud setup, but I can probably give it a try on my old laptop. – Tiernan Dec 04 '17 at 17:46
  • 1
    If you only care about data in a couple of states, you can just download data for those states, for example `download_census("acs5year", 2015, c("MA", "CT")`, and downloaded data generated from Census 2010 `download_generated_data()`, which is about 120 MB. – GL_Li Dec 04 '17 at 18:33
  • @GL_Li, which data should I download if I want to create a mapping of all cities to census tracts. Kindly see this question more details. https://stackoverflow.com/q/75893626/4613606 – Gaurav Singhal Mar 30 '23 at 23:12
1

Using ggmaps package, we can do reverse geocoding to get information using the lat/long points in your data. This will create a vector containing the city name from all data points.

city <- vector(mode = "character", length=nrow(tr@data))
for (i in 1:nrow(tr@data))
    city[i] <- strsplit(revgeocode(c(as.numeric(tr@data[i,12]), 
                                     as.numeric(tr@data[i,11]))), ", ")[[1]][2]
head(city)
[1] "Seattle"          "Lake Forest Park" "North Bend"       "Tukwila"
      "Snoqualmie"       "Woodinville"
TomNash
  • 3,147
  • 2
  • 21
  • 57
  • 1
    This solution works. Unfortunately, the process of georeferencing all those points is pretty darn time consuming. I have updated my question to indicate a preference for *non-spatial* solutions. We'll see if anything comes of it. Thanks @TomNash! – Tiernan Mar 30 '16 at 18:10
  • @TomNash, This code has multiple issues, first `tr@data` through error, then `revgeocode` requires API key, then the format has changed so much that it does not provide city. – Gaurav Singhal Mar 30 '23 at 22:06
0

Adding this answer in case it's useful to others running into this problem. You will need to leave the R console, but there is a great tool for this exact issue: the University of Missouri Census Data Center's Geocorr application. You can select Census place as the source geography and Census block as the target geography and the application will generate a neat CSV with a correlation list showing all the Census blocks in each Census place.

Kumar
  • 113
  • 1
  • 5