0

I have data that looks like below, that has 1 Date variable, 4 numerical variables and 6 categorical variables.

          Date      Department    average_wait_time_min patients_count doctors_count nurses_count year quarters       months fortnights   weeks
    1  2017-01-01 General Checkup                   211            847           165          266 2017  Q1-2017 January-2017    1 -2017 01-2017
    2  2017-01-01 General Checkup                   164             73            14           26 2017  Q1-2017 January-2017    1 -2017 01-2017
    3  2017-01-01       Emergency                    52             90            27           35 2017  Q1-2017 January-2017    1 -2017 01-2017
    4  2017-01-01       Emergency                    42             20             2            6 2017  Q1-2017 January-2017    1 -2017 01-2017
    25 2017-01-02 General Checkup                   250            811           160          261 2017  Q1-2017 January-2017    1 -2017 01-2017
    26 2017-01-02 General Checkup                   166             75            17           20 2017  Q1-2017 January-2017    1 -2017 01-2017
    27 2017-01-02       Emergency                    47             89            24           39 2017  Q1-2017 January-2017    1 -2017 01-2017
    28 2017-01-02       Emergency                    57             18             3            5 2017  Q1-2017 January-2017    1 -2017 01-2017

I want to sum(add) columns that has numerical data that belong to same date like below(also preserving categorical data). expected output(not with blank rows of course):

              Date      Department    average_wait_time_min patients_count doctors_count nurses_count year quarters       months fortnights   weeks
        1  2017-01-01 General Checkup                   375            920           179          292 2017  Q1-2017 January-2017    1 -2017 01-2017

        3  2017-01-01       Emergency                    94            110            29           41 2017  Q1-2017 January-2017    1 -2017 01-2017

        25 2017-01-02 General Checkup                   416            886           177          281 2017  Q1-2017 January-2017    1 -2017 01-2017

        27 2017-01-02       Emergency                   104            107            27           44 2017  Q1-2017 January-2017    1 -2017 01-2017

I've thought of aggregate function since it fits my requirements really close. But, It applies a function(sum in my case) only on one column based on the repeated observations in other variables and this stackoverflow question talks about single numerical column too. But, I want to to sum multiple columns. So, that wouldn't help me.(here's the complete data if you want it)

How do I do this?

Naveen Reddy Marthala
  • 2,622
  • 4
  • 35
  • 67
  • 1
    With `dplyr`, I'd recommend `your_data %>% group_by_if(is.numeric %>% Negate) %>% summarize_all(sum)`. I'm sure there is a good dupe out there... looking for it. – Gregor Thomas Aug 27 '19 at 18:01
  • you can see in my question that I've explained how my problem is different. Please remove that. the question that you thing is duplicate of mine has only one numerical column to add and only categorical column. So, please remove that duplicate mark. – Naveen Reddy Marthala Aug 27 '19 at 18:02
  • `df %>% mutate(year = as.factor(year)) %>% group_by_if(is.numeric %>% Negate) %>% summarize_all(sum)` Your year may be stored as numeric. – Ryan John Aug 27 '19 at 18:38

3 Answers3

1
library(dplyr)
summarize(group_by(put_your_dataframe_name_here, Date, Department, year, quarters, months, fortnights, weeks), sum(average_wait_time_min), sum(patients_count), sum(doctors_count), sum(nurses_count))
Monk
  • 407
  • 3
  • 8
1

Try the group_by summarise_at framework from dplyr:

library(tidyverse)
your_data %>%
  group_by(Date, Department, <all other char columns you wish to preserve>) %>%
  summarise_at(vars(average_wait_time_min, patients_count, doctors_count, nurses_count),
               sum) %>%
  ungroup()
Eric Stern
  • 11
  • 3
1

Here's an automatic way to group by all non-numeric columns and sum all numeric columns:

library(dplyr)
your_data %>% 
  group_by_if(is.numeric %>% Negate) %>%
  summarize_all(sum)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294