0

I have a large dataframe of multiple variables (categorical & continuous). The data are all grouped by identical observations in all variables. Except for 1 continuous variable, which are similar but different. As such I would like to compute the mean of this variable so that the groups of observations can become one row.

I tried 'group_by_all' and 'summarise' in dplyr but it didn't produce the desired result. I have the following dummy code:

vy   <-      c( 'a',    'a',    'c',    'c')
cat  <-      c( 'b',    'b',    'd',    'd')
var  <-      c( 1,  1.3,    2,  2.5)
var1  <-    c(1,    1,  2,  2)
df<-data.frame(vy, cat, var, var1)

The expected result would be 'var' to be averaged within the groups (i.e. marked by the identical other variables). For example:

vy cat var var1

a b 1.15 1

c d 2.25 2

Note: there is some missing values in the dataframe. Any help would be appreciated

Tom
  • 199
  • 8
  • 1
    Isn't this `aggregate(var~vy+cat+var1, df, mean, na.rm = TRUE)` Or `df %>% group_by(vy, cat, var1) %>% summarise(var = mean(var, na.rm = TRUE)) ` in `dplyr` ? – Ronak Shah Nov 08 '19 at 00:09
  • thanks, just wondering if there is an aggregate all function as I have 100 variables in actual job? – Tom Nov 08 '19 at 00:25
  • Maybe use `group_by_at` to include column by position instead of name so you can pass range of integers to it. `df %>% group_by_at(c(1:2, 4)) %>% summarise(var = mean(var, na.rm = TRUE)) ` – Ronak Shah Nov 08 '19 at 00:33

0 Answers0