1

Minimal working example. I don't understand why v2011 is not well defined.

myDf <- data.frame(
Year = c(2010, 2012, 2013, 2010:2013),
value = rnorm(7),
group = c(rep("A", 3), rep("B", 4))
)

myDf %>%
  group_by(group) %>%
  mutate(v2010 = case_when(2010 %in% Year ~ value[Year == 2010], T ~ NA),
         v2011 = case_when(2011 %in% Year ~ value[Year == 2011], T ~ NA))

Zslice
  • 412
  • 1
  • 5
  • 14
  • What is the expected output? – TarJae Jul 26 '23 at 21:01
  • This is not your issue, but you have a misplaced `)`. `v2011 = case_when(2011 %in% Year ~ value[Year == 2011]), T ~ NA)` should be `v2011 = case_when(2011 %in% Year ~ value[Year == 2011], T ~ NA))` (though, as said, that's not your issue, the code still doesn't work.) – Gregor Thomas Jul 26 '23 at 21:04
  • 1
    Group "A" gets a zero-length result when checking for 2011 to determine v2011; you also have a misplaced parentheses after `[Year == 2011]`. As @TarJae mentions: what should the output look like (seems there'll be a more straightforward operation). – I_O Jul 26 '23 at 21:06
  • 1
    @TarJae It's the output produced by Gregor Thomas. Apologies for not specifying – Zslice Jul 26 '23 at 21:18

3 Answers3

3

For a length-1 test (per group), you can use if(){} else{} directly. Unlike vectorized functions like ifelse, if_else, and case_when, the code in an if(){} will only be evaluated if the condition is true.

myDf %>%
  group_by(group) %>%
  mutate(
    v2010 = if(2010 %in% Year) value[Year == 2010] else NA,
    v2011 = if(2011 %in% Year) value[Year == 2011] else NA
  )
# # A tibble: 7 × 5
# # Groups:   group [2]
#    Year  value group  v2010  v2011
#   <dbl>  <dbl> <chr>  <dbl>  <dbl>
# 1  2010  0.233 A      0.233 NA    
# 2  2012 -1.30  A      0.233 NA    
# 3  2013  1.42  A      0.233 NA    
# 4  2010 -0.685 B     -0.685  0.718
# 5  2011  0.718 B     -0.685  0.718
# 6  2012  0.447 B     -0.685  0.718
# 7  2013  0.816 B     -0.685  0.718

Of course, your operation doesn't generalize up well - if you want to do this for more than 1 or 2 columns it becomes repetitive and annoying to code. One alternative would be to filter, pivot, and join. By adjusting the filter you can do this for 1 or many years with the same amount of code:

library(tidyr)
myDf |> 
  filter(Year %in% c(2010, 2011)) |>
  pivot_wider(id_cols = group, names_from = Year, names_prefix = "v", values_from = value) |>
  right_join(myDf)
# Joining with `by = join_by(group)`
# # A tibble: 7 × 5
#   group  v2010  v2011  Year  value
#   <chr>  <dbl>  <dbl> <dbl>  <dbl>
# 1 A      0.233 NA      2010  0.233
# 2 A      0.233 NA      2012 -1.30 
# 3 A      0.233 NA      2013  1.42 
# 4 B     -0.685  0.718  2010 -0.685
# 5 B     -0.685  0.718  2011  0.718
# 6 B     -0.685  0.718  2012  0.447
# 7 B     -0.685  0.718  2013  0.816
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • This resolves my issue, but do you know why case_when fails? – Zslice Jul 26 '23 at 21:22
  • See my comment on Andrea's answer. `case_when` (and `if_else` and `ifelse`) usually tries to calculate all the results and pick the right ones based on the test condition. They don't handle 0-length results well. – Gregor Thomas Jul 26 '23 at 22:47
1

Don't use value[Year == 2011]. The error occurs since in the first group ('A') there is no Year 2011. Instead, check whether there is no such value (!length(value[year == 2011])) and put NA in these cases:

set.seed(123)

myDf %>%
    group_by(group) %>%
    mutate(
        v2010 = case_when(2010 %in% Year ~ ifelse(!length(value[Year == 2010]),
                                                  NA,
                                                  value[Year == 2010]), T ~ NA),
        v2011 = case_when(2011 %in% Year ~ ifelse(!length(value[Year == 2011]),
                                                  NA,
                                                  value[Year == 2011]), T ~ NA)
    )

# A tibble: 7 × 5
# Groups:   group [2]
   Year   value group   v2010  v2011
  <dbl>   <dbl> <chr>   <dbl>  <dbl>
1  2010 -0.0867 A     -0.0867 NA    
2  2012  1.44   A     -0.0867 NA    
3  2013  1.13   A     -0.0867 NA    
4  2010  0.834  B      0.834  -0.287
5  2011 -0.287  B      0.834  -0.287
6  2012  0.373  B      0.834  -0.287
7  2013  0.403  B      0.834  -0.287
Jan
  • 2,245
  • 1
  • 2
  • 16
1

I think that

value[Year == 2011]

is evaluated even if 2011 is not in Year.

this maybe helps understanding what's going on:

myDf %>%
  group_by(group) %>%
  mutate(v2010 = case_when(2010 %in% Year ~ value[Year == 2010], 
                           T ~ NA),
         v2011 = case_when(3000 %in% Year ~ stop("this is being evaluated!"), 
                           T ~ NA_integer_)
  )

you could use:

myDf %>%
  group_by(group) %>% 
  mutate(v2010 = ifelse(is_empty(value[Year==2010]), NA_integer_, value[Year==2010]),
         v2011 = ifelse(is_empty(value[Year==2011]), NA_integer_, value[Year==2011]))
  • Would you have insight on why 3000 is being evaluated? – Zslice Jul 26 '23 at 21:29
  • Nice answer @Andrea! @Zslice, though it is about base `ifelse` not `dpyr::case_when()`, [this answer should give a general idea](https://stackoverflow.com/q/16275149/903061). Basically, for `ifelse()` or `case_when`, to take advantage of R's vectorization, they (generally) evaluate the possible results for both the TRUE and FALSE cases, and then use the test condition to pick out which result is appropriate. But that runs into issues in cases like this where one of the results is of length 0. – Gregor Thomas Jul 26 '23 at 22:45