3

mutate_at() shows an evaluation error when used with group_by() and when imputing a numerical vector for column position as the first (.vars) argument.

  • Issue shows up when using R3.4.2 and dplyr0.7.4 version
  • Works fine when using R3.3.2 and dplyr0.5.0
  • Works fine if .vars is character vector (column name)

Example:

# Create example dataframe
Id <- c('10_1', '10_2', '11_1', '11_2', '11_3', '12_1')
Month <- c(2, 3, 4, 6, 7, 8)
RWA <- c(0, 0, 0, 1.579, NA, 0.379)
dftest = data.frame(Id, Month, RWA)

# Define column to fill NAs
nacol = c('RWA')

# Fill NAs with last period
dftest_2 <- dftest %>%
  group_by(Id) %>%
  mutate_at(which(names(dftest) %in% nacol), 
            funs(ifelse(is.na(.),0,.)))

Error in mutate_impl(.data, dots) : 
Evaluation error: object 'NA' not found.

More sensible example demonstrating issue:

# Create example dataframe
Id <- c('10_1', '10_2', '11_1', '11_3', '11_3', '12_1')
Month <- c(2, 3, 4, 6, 7, 8)
RWA <- c(0, 0, 0, 1.579, NA, 0.379)
dftest = data.frame(Id, Month, RWA)

# Define column to fill NAs
nacol = c('RWA')

# Fill NAs with last period
dftest_2 <- dftest %>%
  group_by(Id) %>%
  mutate_at(which(names(dftest) %in% nacol), 
            funs(na.locf(., na.rm=F)))
  • Try `dftest %>% group_by(Id) %>% mutate_at(intersect(names(.), nacol), funs(replace(., is.na(.), 0)))` – akrun Nov 17 '17 at 16:26
  • @akrun that works because the first argument (.vars) is given as a character vector (column name). What is not working is using a numerical vector with column position for .vars – Yannos Michailidis Nov 17 '17 at 16:35

1 Answers1

3

The reason we are getting NA values is that the output we get from which is 3, but we grouped by 'Id' and so there are only 2 columns after that.

dftest %>%
     group_by(Id) %>% 
     mutate_at(which(names(dftest) %in% nacol)-1, funs(ifelse(is.na(.),0,.)))
# A tibble: 6 x 3
# Groups:   Id [6]
#      Id Month   RWA
#  <fctr> <dbl> <dbl>
#1   10_1     2 0.000
#2   10_2     3 0.000
#3   11_1     4 0.000
#4   11_2     6 1.579
#5   11_3     7 0.000
#6   12_1     8 0.379

The group_by is part is not needed here as we are changing NA values in other columns to 0

dftest %>%
    mutate_at(which(names(dftest) %in% nacol), funs(ifelse(is.na(.),0,.)))

It could be a bug and using the position based approach is sometimes risky. Better option would be to go with names

dftest %>%
    group_by(Id) %>% 
    mutate_at(intersect(names(.), nacol), funs(replace(., is.na(.), 0)))

NOTE: In all these cases, the group_by is not needed


Another option is replace_na from tidyr

dftest %>%
    tidyr::replace_na(as.list(setNames(0, nacol)))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Makes sense, thanks. Yes group_by is not needed here - I oversimplified the example - but if we were to have multiple rows with the same id and `funs(na.locf(., na.rm=F))` for example, `group_by` use would be sensible. – Yannos Michailidis Nov 17 '17 at 16:49
  • 1
    @akrun I was just having this issue while I was relearning how to specify columns with column positions. Great info. +1 – jazzurro Dec 02 '17 at 14:30