mutate_at evaluation error when using group_by

Question

mutate_at() shows an evaluation error when used with group_by() and when imputing a numerical vector for column position as the first (.vars) argument.

Issue shows up when using R3.4.2 and dplyr0.7.4 version
Works fine when using R3.3.2 and dplyr0.5.0
Works fine if .vars is character vector (column name)

Example:

# Create example dataframe
Id <- c('10_1', '10_2', '11_1', '11_2', '11_3', '12_1')
Month <- c(2, 3, 4, 6, 7, 8)
RWA <- c(0, 0, 0, 1.579, NA, 0.379)
dftest = data.frame(Id, Month, RWA)

# Define column to fill NAs
nacol = c('RWA')

# Fill NAs with last period
dftest_2 <- dftest %>%
  group_by(Id) %>%
  mutate_at(which(names(dftest) %in% nacol), 
            funs(ifelse(is.na(.),0,.)))

Error in mutate_impl(.data, dots) : 
Evaluation error: object 'NA' not found.

More sensible example demonstrating issue:

# Create example dataframe
Id <- c('10_1', '10_2', '11_1', '11_3', '11_3', '12_1')
Month <- c(2, 3, 4, 6, 7, 8)
RWA <- c(0, 0, 0, 1.579, NA, 0.379)
dftest = data.frame(Id, Month, RWA)

# Define column to fill NAs
nacol = c('RWA')

# Fill NAs with last period
dftest_2 <- dftest %>%
  group_by(Id) %>%
  mutate_at(which(names(dftest) %in% nacol), 
            funs(na.locf(., na.rm=F)))

Try `dftest %>% group_by(Id) %>% mutate_at(intersect(names(.), nacol), funs(replace(., is.na(.), 0)))` — akrun, Nov 17 '17 at 16:26
@akrun that works because the first argument (.vars) is given as a character vector (column name). What is not working is using a numerical vector with column position for .vars — Yannos Michailidis, Nov 17 '17 at 16:35

akrun · Accepted Answer · 2017-11-17T16:50:13.883

The reason we are getting NA values is that the output we get from which is 3, but we grouped by 'Id' and so there are only 2 columns after that.

dftest %>%
     group_by(Id) %>% 
     mutate_at(which(names(dftest) %in% nacol)-1, funs(ifelse(is.na(.),0,.)))
# A tibble: 6 x 3
# Groups:   Id [6]
#      Id Month   RWA
#  <fctr> <dbl> <dbl>
#1   10_1     2 0.000
#2   10_2     3 0.000
#3   11_1     4 0.000
#4   11_2     6 1.579
#5   11_3     7 0.000
#6   12_1     8 0.379

The group_by is part is not needed here as we are changing NA values in other columns to 0

dftest %>%
    mutate_at(which(names(dftest) %in% nacol), funs(ifelse(is.na(.),0,.)))

It could be a bug and using the position based approach is sometimes risky. Better option would be to go with names

dftest %>%
    group_by(Id) %>% 
    mutate_at(intersect(names(.), nacol), funs(replace(., is.na(.), 0)))

NOTE: In all these cases, the group_by is not needed

Another option is replace_na from tidyr

dftest %>%
    tidyr::replace_na(as.list(setNames(0, nacol)))

Makes sense, thanks. Yes group_by is not needed here - I oversimplified the example - but if we were to have multiple rows with the same id and `funs(na.locf(., na.rm=F))` for example, `group_by` use would be sensible. — Yannos Michailidis, Nov 17 '17 at 16:49
@akrun I was just having this issue while I was relearning how to specify columns with column positions. Great info. +1 — jazzurro, Dec 02 '17 at 14:30

mutate_at evaluation error when using group_by

1 Answers1

Linked