-1

I have a large dataset containing the names of hospitals, the hospital groups and then the number of presenting patients by month. I'm trying to use dplyr to create a summary that contains the total number of presenting patients each month, aggregated by hospital group. The data frame looks like this:

Hospital | Hospital_group | Jan 03 | Feb 03 | Mar 03 | Apr 03 | .....
---------------------------------------------------------------
Hosp 1   | Group A        |    5   |    5   |    6   |    4   | .....
---------------------------------------------------------------
Hosp 2   | Group A        |    6   |    3   |    8   |    2   | .....
---------------------------------------------------------------
Hosp 3   | Group B        |    5   |    5   |    6   |    4   | .....
---------------------------------------------------------------
Hosp 4   | Group B        |    3   |    7   |    2   |    1   | .....
---------------------------------------------------------------

I'm trying to create a new dataframe that looks like this:

Hospital_group |Jan 03 | Feb 03 | Mar 03 | Apr 03 | .....
----------------------------------------------------------
Group A        |   11  |    8   |    14  |   6    | .....
----------------------------------------------------------
Group B        |   8   |    12  |     8  |   5    | .....
----------------------------------------------------------

I'm trying to use dplyr to summarise the data but am a little stuck (am very new at this as you might have guessed). I've managed to filter out the first column (hospital name) and group_by the hospital group but am not sure how to get a cumulative sum total for each month and year (there is a large number of date columns so I'm hoping there is a quick and easy way to do this).

Sorry about posting such a basic question - any help or advice would be greatly appreciated.

Greg

Greg Martin
  • 243
  • 3
  • 5
  • 17

2 Answers2

6

Use summarize_all: Example:

df <- tibble(name=c("a","b", "a","b"), colA = c(1,2,3,4), colB=c(5,6,7,8))
df

# A tibble: 4 × 3
   name  colA  colB
  <chr> <dbl> <dbl>
1     a     1     5
2     b     2     6
3     a     3     7
4     b     4     8

df %>% group_by(name) %>% summarize_all(sum)

Result:

# A tibble: 2 × 3
   name  colA  colB
  <chr> <dbl> <dbl>
1     a     4    12
2     b     6    14

Edit: In your case, your data frame contains one column that you do not want to aggregate (the Hospital name.) You might have to either deselect the hospital name column first, or use summarize_at(vars(-Hospital), funs(sum)) instead of summarize_all.

hdkrgr
  • 1,666
  • 1
  • 12
  • 22
2

We can do this using base R

We split the dataframe by Hospital_group and then sum it column-wise.

do.call(rbind, lapply(split(df[-c(1, 2)], df$Hospital_group), colSums))


#          Jan_03 Feb_03 Mar_03 Apr_03
#Group_A     11      8     14      6
#Group_B      8     12      8      5
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213