0

I am trying to consolidate a data frame I am getting from the US census bureau API. The data frame is the gini index for US metro areas. I run the api call on a loop for each year available (2010 - 2017) and generate a stacked data frame. However, I can't consolidate the data frame so that it has one row per metro area and one column for each year with the correct variable.

The file I currently get looks like this:

GEOID  |   NAME   |  VARIABLE  |   ESTIMATE |   MOE
____________________________________________________

01223     Metro1      B1059          .05       .02
01224     Metro2      B1059          .06       .01
01225     Metro3      B1059          .07       .03
01223     Metro1      B1059          .02       .02
01224     Metro2      B1059          .04       .04
01225     Metro3      B1059          .09       .09

What I am trying to get it to look like is this:

GEOID  |   NAME   |  Year-1 |   MOE-1  |  Year-2  | MOE-2
_________________________________________________________________________

01223     Metro1      .05       .02       .02      .02
01224     Metro2      .06       .01       .04      .04
01225     Metro3      .07       .03       .09      .09

I've tried various versions of piping/ group_by and aggregate but I always seem to throw an error or get the wrong result. The code I have that works is below. Any help would be appreciated!

{r, cache = TRUE}

require(censusapi)

year_block <- c(2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017)
census_api_key("xyz123")

metro_gini <- c()
for(i in 1:length(year_block)){
col_name <- paste0("Year-",i)
tmp_gini <- get_acs(geography = "metropolitan statistical area/micropolitan statistical area", table = "B19083", year = year_block[i])
metro_gini <- rbind(metro_gini, tmp_gini, col_name)
}

metro_gini <- subset(metro_gini, select = -c(variable)) %>%
  group_by(NAME)

head(metro_gini)

FrenchConnections
  • 361
  • 1
  • 3
  • 13
  • There is package information missing from your code. For example, are you using `library(dplyr)` or `library(magrittr)` or something else? What's the link for the API? – shea Apr 23 '19 at 22:44
  • 1
    You've got a 'reshape long to wide' problem here. You're just missing a counter for each time another occurrence of `GEOID+NAME` occurs. See here: https://stackoverflow.com/questions/11322801/transpose-reshape-dataframe-without-timevar-from-long-to-wide-format/11323251 which suggests something like - `dat$time <- ave(seq_len(nrow(dat)), dat[c("GEOID","NAME")], FUN=seq_along); reshape( dat, idvar=c("GEOID","NAME"), direction="wide", sep="-", drop="VARIABLE" )` - other answers involving packages are over at the linked question too. – thelatemail Apr 23 '19 at 22:47

0 Answers0