replace NA with groups mean in a non specified number of columns

Question

I want to replace the NA with mean of each single group collembola and mite in multiple columns. Here it is an example with 3 columns however I want to apply this a data frame with 5000 columns

dat <- read.table(text = 
                  "id    ID        length  width    extra
                  101   collembola  2.1     0.9     1
                  102   mite        NA      0.7     NA
                  103   mite        1.1     0.8     2
                  104   collembola  1       NA      3
                  105   collembola  1.5     0.5     4
                  106   mite        NA      NA      NA
                  106   mite        1.9     NA      4", 
                  header=TRUE)

It works if I enter each column

library(plyr)
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
data2 <- ddply(dat, ~ ID, transform, length = impute.mean(length))

I want to apply the function that calculates the mean of each single group ID collembola and mite across multiple columns, below is what I tried (it does not work):

dat2 <- ddply(dat, ~ ID, transform,  impute.mean(dat[,3:ncol(dat)]))

yeedle · Accepted Answer · 2017-06-06T23:47:43.250

6

If you don't mind using dplyr:

library(dplyr)

dat %>% 
  group_by(ID) %>% 
  mutate_if(is.numeric, function(x) ifelse(is.na(x), mean(x, na.rm = TRUE), x))
#> # A tibble: 7 x 5
#> # Groups:   ID [2]
#>      id         ID length width extra
#>   <int>     <fctr>  <dbl> <dbl> <dbl>
#> 1   101 collembola    2.1  0.90     1
#> 2   102       mite    1.5  0.70     3
#> 3   103       mite    1.1  0.80     2
#> 4   104 collembola    1.0  0.70     3
#> 5   105 collembola    1.5  0.50     4
#> 6   106       mite    1.5  0.75     3
#> 7   106       mite    1.9  0.75     4

edited Jun 06 '17 at 23:47

answered Jun 06 '17 at 22:44

yeedle

4,918
1
22
22

Your answer calculate the mean of the full columns what I want is the mean of the different `ID` groups – Al14 Jun 06 '17 at 23:02
Edited my answer to fix that. – yeedle Jun 06 '17 at 23:18

lebelinoz · Answer 2 · 2017-06-06T22:52:30.663

0

Try

library(plyr)
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
dat2 <- ddply(dat, ~ ID, transform, length = impute.mean(length),
          width = impute.mean(width), extra = impute.mean(extra))

edited Jun 06 '17 at 22:52

answered Jun 06 '17 at 22:49

lebelinoz

4,890
10
33
56

The title of my question is about multiple columns – Al14 Jun 06 '17 at 22:52
@Al14 Noted. Removed snarky reference to other question. – lebelinoz Jun 06 '17 at 22:53

replace NA with groups mean in a non specified number of columns

2 Answers2

Linked

Related