Aggregating a Column of String Values

Question

I am trying to group a data frame by two columns (agency name and year) to create a new data frame containing the sum of my numeric variables and the string variables. However, when I use the group command in dplyr all of my string variables are dropped.

I have a data frame containing data on transportation grants. The variables consist of the grant funding, a dummy indicating if the grant was awarded, the agency that applied, the year, and the state that the agency is located in. Each agency is located in a unique state. I would like to group the data by the agency and the year (because some agencies submitted multiple applications in the same year) and create a new data frame containing the agency, the sum of grant funds, the sum of awarded grants, the year, and the state.

I have tried the following code:

library(tidyverse)

Applicant_Name <- Applications %>%
  group_by(ApplicantName, Year) %>%
  select(everything()) %>%
 summarise_if(is.numeric, sum, na.rm=TRUE)

However, when I run this code the state variable is dropped from the data frame. I have also considered creating another data frame with the string variable via the aggregate command. Then merging this data frame with the Applicant_Name data frame. I used the following code:

test2 <- aggregate(test$State, by = list(test$ApplicantName, test$Year), paste(), 
                   collapse = " ")

However, this code creates a State column where the state names are listed in each cell as many times as the applicant name appears in the test data frame.

Since we don't have your data, we can't run your code, and we can't see any output, so the best anybody can do is guess. [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that's easy for folks to help with. — camille, Aug 28 '19 at 21:06

score 0 · Answer 1 · answered Sep 27 '19 at 12:34

I have interpreted your data description to my best effort, and come up with the following:

dt = data.table(
  grant = sample(seq(10000, 500000, by = 10000), 200, T),
  agency = sample(LETTERS[1:3], 200, T),
  awarded = sample(c(T,F), 200, T),
  year = sample(1990:2010, 200, T),
  state = sample(letters[1:10], 200, T)
)

> head(dt)
    grant agency awarded year state
1: 100000      A   FALSE 2007     j
2: 190000      C    TRUE 2006     b
3: 390000      C   FALSE 1992     h
4: 450000      B   FALSE 2002     d
5:  70000      B   FALSE 1996     d
6: 130000      B   FALSE 2000     g

The following code produces the sum of (i) applied grants and (ii) awarded grants for each agency-state-year group.

mat = dt[, .(SumApplied = sum(grant), SumAward = sum(awarded*grant)), keyby = .(agency, year, state)]


> mat
     agency year state SumApplied SumAward
  1:      A 1990     f    1160000   660000
  2:      A 1990     g     360000   360000
  3:      A 1990     j     330000        0
  4:      A 1991     a     800000   800000
  5:      A 1991     d    1120000        0
 ---                                      
169:      C 2009     c     140000   140000
170:      C 2010     a     490000        0
171:      C 2010     b     330000        0
172:      C 2010     f      40000    40000
173:      C 2010     h     360000        0

I hope it helps!

Aggregating a Column of String Values

1 Answers1