4

Hi there: Can anyone offer a layperson's explanation for why these two ways of trying to calculate an row average of scores work differently? Thanks.

library(tidyverse)
var1<-rnorm(100)
var2<-rnorm(100)
var3<-rnorm(100)

df<-data.frame(var1, var2, var3)

#ADD IN A MISSING VALUE
df[1,1]<-NA

#I thought this would work
df %>% 
  select(starts_with('var')) %>% 
  rowwise() %>% 
  mutate(avg=mean(., na.rm=T))
#This does work but I don't understand why
df %>% 
  rowwise() %>% 
  mutate(avg=
           mean(
             c_across(starts_with('var')), na.rm=T)
         )

spindoctor
  • 1,719
  • 1
  • 18
  • 42

1 Answers1

4
  • . represents the entire dataset and not the grouped data.
  • Moreover, mean doesn't work on dataframes. (see mean(mtcars))

Since dplyr 1.0.0 (or higher) you can use cur_data() to get data in the group but to use it in mean you need to change it to vector which can be done with unlist or as.matrix. So try :

library(dplyr)

df %>% 
  select(starts_with('var')) %>% 
  rowwise() %>% 
  mutate(avg=mean(unlist(cur_data()), na.rm=T))

Your second approach however, is the correct way to use rowwise.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213