Vector size error with case_when function in R

Question

Minimal working example. I don't understand why v2011 is not well defined.

myDf <- data.frame(
Year = c(2010, 2012, 2013, 2010:2013),
value = rnorm(7),
group = c(rep("A", 3), rep("B", 4))
)

myDf %>%
  group_by(group) %>%
  mutate(v2010 = case_when(2010 %in% Year ~ value[Year == 2010], T ~ NA),
         v2011 = case_when(2011 %in% Year ~ value[Year == 2011], T ~ NA))

This is not your issue, but you have a misplaced `)`. `v2011 = case_when(2011 %in% Year ~ value[Year == 2011]), T ~ NA)` should be `v2011 = case_when(2011 %in% Year ~ value[Year == 2011], T ~ NA))` (though, as said, that's not your issue, the code still doesn't work.) — Gregor Thomas, Jul 26 '23 at 21:04
Group "A" gets a zero-length result when checking for 2011 to determine v2011; you also have a misplaced parentheses after `[Year == 2011]`. As @TarJae mentions: what should the output look like (seems there'll be a more straightforward operation). — I_O, Jul 26 '23 at 21:06
@TarJae It's the output produced by Gregor Thomas. Apologies for not specifying — Zslice, Jul 26 '23 at 21:18

Gregor Thomas · Accepted Answer · 2023-07-26T22:52:16.793

For a length-1 test (per group), you can use if(){} else{} directly. Unlike vectorized functions like ifelse, if_else, and case_when, the code in an if(){} will only be evaluated if the condition is true.

myDf %>%
  group_by(group) %>%
  mutate(
    v2010 = if(2010 %in% Year) value[Year == 2010] else NA,
    v2011 = if(2011 %in% Year) value[Year == 2011] else NA
  )
# # A tibble: 7 × 5
# # Groups:   group [2]
#    Year  value group  v2010  v2011
#   <dbl>  <dbl> <chr>  <dbl>  <dbl>
# 1  2010  0.233 A      0.233 NA    
# 2  2012 -1.30  A      0.233 NA    
# 3  2013  1.42  A      0.233 NA    
# 4  2010 -0.685 B     -0.685  0.718
# 5  2011  0.718 B     -0.685  0.718
# 6  2012  0.447 B     -0.685  0.718
# 7  2013  0.816 B     -0.685  0.718

Of course, your operation doesn't generalize up well - if you want to do this for more than 1 or 2 columns it becomes repetitive and annoying to code. One alternative would be to filter, pivot, and join. By adjusting the filter you can do this for 1 or many years with the same amount of code:

library(tidyr)
myDf |> 
  filter(Year %in% c(2010, 2011)) |>
  pivot_wider(id_cols = group, names_from = Year, names_prefix = "v", values_from = value) |>
  right_join(myDf)
# Joining with `by = join_by(group)`
# # A tibble: 7 × 5
#   group  v2010  v2011  Year  value
#   <chr>  <dbl>  <dbl> <dbl>  <dbl>
# 1 A      0.233 NA      2010  0.233
# 2 A      0.233 NA      2012 -1.30 
# 3 A      0.233 NA      2013  1.42 
# 4 B     -0.685  0.718  2010 -0.685
# 5 B     -0.685  0.718  2011  0.718
# 6 B     -0.685  0.718  2012  0.447
# 7 B     -0.685  0.718  2013  0.816

This resolves my issue, but do you know why case_when fails? — Zslice, Jul 26 '23 at 21:22
See my comment on Andrea's answer. `case_when` (and `if_else` and `ifelse`) usually tries to calculate all the results and pick the right ones based on the test condition. They don't handle 0-length results well. — Gregor Thomas, Jul 26 '23 at 22:47

Jan · Answer 2 · 2023-07-27T04:37:38.527

Don't use value[Year == 2011]. The error occurs since in the first group ('A') there is no Year 2011. Instead, check whether there is no such value (!length(value[year == 2011])) and put NA in these cases:

set.seed(123)

myDf %>%
    group_by(group) %>%
    mutate(
        v2010 = case_when(2010 %in% Year ~ ifelse(!length(value[Year == 2010]),
                                                  NA,
                                                  value[Year == 2010]), T ~ NA),
        v2011 = case_when(2011 %in% Year ~ ifelse(!length(value[Year == 2011]),
                                                  NA,
                                                  value[Year == 2011]), T ~ NA)
    )

# A tibble: 7 × 5
# Groups:   group [2]
   Year   value group   v2010  v2011
  <dbl>   <dbl> <chr>   <dbl>  <dbl>
1  2010 -0.0867 A     -0.0867 NA    
2  2012  1.44   A     -0.0867 NA    
3  2013  1.13   A     -0.0867 NA    
4  2010  0.834  B      0.834  -0.287
5  2011 -0.287  B      0.834  -0.287
6  2012  0.373  B      0.834  -0.287
7  2013  0.403  B      0.834  -0.287

Presumably they want the value when the Year is 2011, if 2011 is present in the group. — Gregor Thomas, Jul 26 '23 at 21:09
To echo, v2011 should be -0.287 for all years for the B group — Zslice, Jul 26 '23 at 21:24

score 1 · Answer 3 · answered Jul 26 '23 at 21:18

1

I think that

value[Year == 2011]

is evaluated even if 2011 is not in Year.

this maybe helps understanding what's going on:

myDf %>%
  group_by(group) %>%
  mutate(v2010 = case_when(2010 %in% Year ~ value[Year == 2010], 
                           T ~ NA),
         v2011 = case_when(3000 %in% Year ~ stop("this is being evaluated!"), 
                           T ~ NA_integer_)
  )

you could use:

myDf %>%
  group_by(group) %>% 
  mutate(v2010 = ifelse(is_empty(value[Year==2010]), NA_integer_, value[Year==2010]),
         v2011 = ifelse(is_empty(value[Year==2011]), NA_integer_, value[Year==2011]))

answered Jul 26 '23 at 21:18

Andrea Barghetti

116
4

Would you have insight on why 3000 is being evaluated? – Zslice Jul 26 '23 at 21:29
Nice answer @Andrea! @Zslice, though it is about base `ifelse` not `dpyr::case_when()`, [this answer should give a general idea](https://stackoverflow.com/q/16275149/903061). Basically, for `ifelse()` or `case_when`, to take advantage of R's vectorization, they (generally) evaluate the possible results for both the TRUE and FALSE cases, and then use the test condition to pick out which result is appropriate. But that runs into issues in cases like this where one of the results is of length 0. – Gregor Thomas Jul 26 '23 at 22:45

Vector size error with case_when function in R

3 Answers3