19

A really simple evaluation in dplyr::case_when() returns a bizarre error message in dplyr_1.0.8 under R version 4.1.2. I've isolated the behavior in this code, where I'm trying to adjust the value of the durationI variable if one of two edge cases occur:

library(tidyverse)

# Create simple example data
raw <- tribble(
  ~activity_ID, ~durationI, ~distanceI, ~tmode,
             1,        190,         57, "auto",
             2,         23,         41,     NA,
             3,         91,         58, "rail"
)

# Now trip it up
update <- mutate(raw,
  distanceI = ifelse(is.na(tmode), NA, distanceI),
  durationI = case_when(is.na(tmode) ~ NA, durationI > 180 ~ 180,
    TRUE ~ durationI))

# Should result in:
#   activity_ID, durationI, distanceI, tmode
#             1,       180,        57,  auto
#             2,        NA,        41,    NA
#             3,        91,        58,  rail

When I run this code it produces the following error message:

Error in `mutate()`:
! Problem while computing `durationI = case_when(is.na(tmode) ~
  NA, durationI > 180 ~ 180, TRUE ~ durationI)`.
Caused by error in `` names(message) <- `*vtmp*` ``:
! 'names' attribute [1] must be the same length as the vector [0]
Run `rlang::last_error()` to see where the error occurred.

When I run rlang::last_error() it is similarly unhelpful:

<error/dplyr:::mutate_error>
Error in `mutate()`:
! Problem while computing `durationI = case_when(is.na(mode) ~
  NA, durationI > 180 ~ 180, TRUE ~ durationI)`.
Caused by error in `` names(message) <- `*vtmp*` ``:
! 'names' attribute [1] must be the same length as the vector [0]
Backtrace:
  1. dplyr::mutate(...)
  6. dplyr::case_when(...)
  7. dplyr:::replace_with(...)
  8. dplyr:::check_type(val, x, name, error_call = error_call)
  9. rlang::abort(msg, call = error_call)
 10. rlang:::signal_abort(cnd, .file)
 11. base::signalCondition(cnd)
 13. rlang:::conditionMessage.rlang_error(cond)
 14. rlang::cnd_message(c)
 15. rlang:::cnd_message_format(cnd, ...)
 16. cli::cli_format(glue_escape(lines), .envir = emptyenv())
Run `rlang::last_trace()` to see the full context.

If I check the length of all of the variables they're of course all of the same length. I'm stumped. What am I missing?

Rick Donnelly
  • 1,403
  • 3
  • 12
  • 10

2 Answers2

36

You get the problem because you are trying to mix a logical and a numeric vector.

In your case_when statement:

case_when(
  is.na(tmode) ~ NA,
  durationI > 180 ~ 180,
  TRUE ~ durationI
)

Your first case evaluates to NA. This makes R think that you want a logical vector. When the next row is evaluating to a numeric, you get the error.

You can fix this by replacing NA with a missing value of type numeric NA_real_:

raw %>% 
  mutate(
    distanceI = ifelse(is.na(tmode), NA, distanceI),
    durationI = case_when(
      is.na(tmode) ~ NA_real_,
      durationI > 180 ~ 180,
      TRUE ~ durationI
    )
  )
#> # A tibble: 3 × 4
#>   activity_ID durationI distanceI tmode
#>         <dbl>     <dbl>     <dbl> <chr>
#> 1           1       180        57 auto 
#> 2           2        NA        NA <NA> 
#> 3           3        91        58 rail
jpiversen
  • 3,062
  • 1
  • 8
  • 12
  • 1
    Thanks for the quick response on this. I had assumed that case_when() would cast the result based upon whichever case it chose (e.g., evaluate case, then evaluate class). Your reply was very helpful. – Rick Donnelly Feb 25 '22 at 18:39
0

I had a similar issue caused by inadvertently trying to mix numeric with integer types:

# x was an integer, and I was trying to make it 1 (numeric) if NA 
df %>%
  mutate(x = case_when(is.na(x) ~ 1)

Changing 1 to 1L fixed the issue.

tauft
  • 546
  • 4
  • 13