I am trying to group a data frame by two columns (agency name and year) to create a new data frame containing the sum of my numeric variables and the string variables. However, when I use the group command in dplyr all of my string variables are dropped.
I have a data frame containing data on transportation grants. The variables consist of the grant funding, a dummy indicating if the grant was awarded, the agency that applied, the year, and the state that the agency is located in. Each agency is located in a unique state. I would like to group the data by the agency and the year (because some agencies submitted multiple applications in the same year) and create a new data frame containing the agency, the sum of grant funds, the sum of awarded grants, the year, and the state.
I have tried the following code:
library(tidyverse)
Applicant_Name <- Applications %>%
group_by(ApplicantName, Year) %>%
select(everything()) %>%
summarise_if(is.numeric, sum, na.rm=TRUE)
However, when I run this code the state variable is dropped from the data frame. I have also considered creating another data frame with the string variable via the aggregate command. Then merging this data frame with the Applicant_Name data frame. I used the following code:
test2 <- aggregate(test$State, by = list(test$ApplicantName, test$Year), paste(),
collapse = " ")
However, this code creates a State column where the state names are listed in each cell as many times as the applicant name appears in the test data frame.