R data imputation from group_by table

Question

group = c(1,1,4,4,4,5,5,6,1,4,6)
animal = c('a','b','c','c','d','a','b','c','b','d','c')
sleep = c(14,NA,22,15,NA,96,100,NA,50,2,1)

test = data.frame(group, animal, sleep)
print(test)

group_animal = test %>% group_by(`group`, `animal`) %>% summarise(mean_sleep = mean(sleep, na.rm = T))

I would like to replace the NA values the sleep column based on the mean sleep value grouped by group and animal.

Is there any way that I can perform some sort of lookup like Excel that matches group and animal from the test dataframe to the group_animal dataframe and replaces the NA value in the sleep column from the test df with the sleep value in the group_animal df?

score 1 · Accepted Answer · answered Jul 26 '22 at 15:35

1

We could use mutate instead of summarise as summarise returns a single row per group

library(dplyr)
library(tidyr)
test <- test %>% 
  group_by(group, animal) %>% 
  mutate(sleep = replace_na(sleep, mean(sleep, na.rm = TRUE))) %>%
  ungroup

-output

test
# A tibble: 11 × 3
   group animal sleep
   <dbl> <chr>  <dbl>
 1     1 a         14
 2     1 b         50
 3     4 c         22
 4     4 c         15
 5     4 d          2
 6     5 a         96
 7     5 b        100
 8     6 c          1
 9     1 b         50
10     4 d          2
11     6 c          1

answered Jul 26 '22 at 15:35

akrun

874,273
37
540
662

How about instead of mean, I want to use count instead? For example, the sleep column has a bunch of 'yes' and 'no' as well as NA. How do I replace the NAs with the most popular sleep value based on group and animal? – Michael Zhao Jul 26 '22 at 16:21
1

@MichaelZhao you can use `replace_na(sleep, Mode(sleep))` where `Mode` is defined in [here](https://stackoverflow.com/questions/31400445/r-how-to-find-the-mode-of-a-vector) – akrun Jul 26 '22 at 17:38

R data imputation from group_by table

1 Answers1