1

In this data frame there are two values for each id for all columns. I want to take the average of all columns for each id. My concern is how to handle NA. When there is NA, the other value should be reported.

id <- rep(1:3, each=2)
v1 <- c(1,2,5,NA,9,3)
v2 <- c(8,3,9,7,2,NA)
df <- data.frame(id, v1,v2)
df
 id v1 v2
  1  1  8
  1  2  3
  2  5  9
  2 NA  7
  3  9  2
  3  3 NA

Expected outcome:
id <- c(1,2,3)
v1 <- c(1.5,5,6)
v2 <- c(5.5,8,2)
d <- data.frame(id,v1,v2)
d
id  v1  v2
1  1 1.5 5.5
2  2 5.0 8.0
3  3 6.0 2.0

If I do like below, the ids and columns when there were one NA, will be filled as NA

newdf <- df %>% group_by(id) %>% summarise_each(funs(mean))
newdf
# A tibble: 3 x 3
     id    v1    v2
  <int> <dbl> <dbl>
1     1   1.5   5.5
2     2  NA     8  
3     3   6    NA  
user11916948
  • 944
  • 5
  • 12

1 Answers1

1

Try aggregate:

aggregate(.~id, df, mean, na.action = NULL, na.rm = TRUE)
#  id  v1  v2
#1  1 1.5 5.5
#2  2 5.0 8.0
#3  3 6.0 2.0
GKi
  • 37,245
  • 2
  • 26
  • 48