-1

I am trying to create a new variable that replaces the total scores of an old variable assuming certain conditions are met. The code I have tried is:

Baseline_Data_V2 <- Baseline_Data_V1 %>% 
  mutate(NewVariable = case_when(KY27SOC_NA_Total == 1 &
                                  is.na(KY27SOC2) & 
                                  KY27SOC_Total_Adjusted == 3 ~ -3.711,
                                KY27SOC_NA_Total == 1 &
                                  is.na(KY27SOC2) & 
                                  KY27SOC_Total_Adjusted == 4 ~ -2.474,
                                KY27SOC_NA_Total == 1 &
                                  is.na(KY27SOC2) & 
                                  KY27SOC_Total_Adjusted == 5 ~ -1.813,
                                KY27SOC_NA_Total == 1 &
                                  is.na(KY27SOC2) & 
                                  KY27SOC_Total_Adjusted == 6 ~ -1.312,
                                KY27SOC_NA_Total == 1 &
                                  is.na(KY27SOC2) & 
                                  KY27SOC_Total_Adjusted == 7 ~ -0.88,
                                KY27SOC_NA_Total == 1 &
                                  is.na(KY27SOC2) & 
                                  KY27SOC_Total_Adjusted == 8 ~ -0.476,
                                KY27SOC_NA_Total == 1 &
                                  is.na(KY27SOC2) & 
                                  KY27SOC_Total_Adjusted == 9 ~ -0.079,
                                KY27SOC_NA_Total == 1 &
                                  is.na(KY27SOC2) & 
                                  KY27SOC_Total_Adjusted == 10 ~ 0.327,
                                KY27SOC_NA_Total == 1 &
                                  is.na(KY27SOC2) & 
                                  KY27SOC_Total_Adjusted == 11 ~ 0.757,
                                KY27SOC_NA_Total == 1 &
                                  is.na(KY27SOC2) & 
                                  KY27SOC_Total_Adjusted == 12 ~ 1.227,
                                KY27SOC_NA_Total == 1 &
                                  is.na(KY27SOC2) & 
                                  KY27SOC_Total_Adjusted == 13 ~ 1.77,
                                KY27SOC_NA_Total == 1 &
                                  is.na(KY27SOC2) & 
                                  KY27SOC_Total_Adjusted == 14 ~ 2.475,
                                KY27SOC_NA_Total == 1 &
                                  is.na(KY27SOC2) & 
                                  KY27SOC_Total_Adjusted == 15 ~ 3.754,
                                .default = OldVariableScores))
#> Error in Baseline_Data_V2 %>% mutate(NewVariable = case_when(KY27SOC_NA_Total == : could not find function "%>%"

Created on 2023-08-03 with reprex v2.0.2

Basically, if the total number of NAs in the variable KY27SOC_NA_Total is 1, and KYSOC2 is an NA value, and the total of KYSOC_Total is 3, I want the new variable to have a value of -3.711. Else, I want the value of the OldVariableScores to be upheld. This is then replicated for different total scores of KYSOC_Total, with the other two conditions always being upheld, and the value of the OldVariableScores being upheld if the conditions are not met (this will range from 3 to 20, but can also have NAs). I have tried multiple options but basically the outcome is always a column full of NAs. If anyone has any idea of where I am going wrong, I'd really appreciate your feedback. Thank you.

P.S - I found that there was an error with one of the variables, that is why it was not working. Otherwise this code seems to do the trick. It seems that the suggestion below also works, so I will give it a +1. My apologies for initially uploading an image, reprex() in renv environment was not working so I foolishly decided to upload the image. THANK YOU to all who spent time looking at this, it is much appreciated!!

  • 9
    Please help to resolve the issue by boiling it down to a minimal reproducible example: https://stackoverflow.com/help/minimal-reproducible-example , https://meta.stackoverflow.com/questions/285551/why-should-i-not-upload-images-of-code-data-errors – I_O Aug 03 '23 at 15:12
  • 4
    Also, please don't share screenshots of code. Copy/paste code along with your reproducible example so it can be copy/pasted for creating answers. – Gregor Thomas Aug 03 '23 at 15:19
  • 6
    From your screenshot, one easy improvement is to handle the case where `KY27SOC2` is not `NA` first so you don't have to include `& is.na(KY27SOC2)` in every single condition. Start with `!is.na(KY27SOC2) ~ OldVariableScores` and then do the rest. But this also looks painful to code up as `case_when` entirely, maybe use a look-up table and a join instead? No way to know/test with sample data. – Gregor Thomas Aug 03 '23 at 15:30

1 Answers1

2

Here's a guess.

library(dplyr)
tmp <- tribble( # from the tibble package, re-exported by dplyr
  ~ KY27SOC_NA_Total, ~KY27SOC_Total, ~ NewVariableScores,
  1, 3, -3.711,
  1, 4, -2.474,
  1, 5, -1.813,
  1, 6, -1.312,
  1, 7, -0.88,
  1, 8, -0.476,
  1, 9, -0.079,
  1, 10, 0.327,
  1, 11, 0.757)
Baseline_Data_V1 %>%
  left_join(tmp, by = c("KY27SOC_NA_Total", "KY27SOC_Total")) %>%
  mutate(NewVariableScores = if_else(is.na(KY27SOC2), coalesce(NewVariableScores, OldVariableScores),
                                     OldVariableScores))

The premise is what GregorThomas suggested: a lookup table that we join in. I'm extending it with coalesce to keep old values.

For more information on the merge/join concept, see

coalesce is effectively if_else(is.na(a), b, a) (though it can deal with more than two columns). In effect, it means "give me the first non-NA value, prioritizing the first arguments over the following arguments. It is a vectorized operation. C.f.,

coalesce(1, 2, 3)    # first value is non-NA, return it
# [1] 1
coalesce(NA, 2, 3)   # second value is the first non-NA
# [1] 2
coalesce(1, NA, 3)   # first value is still first, ignore rest (still)
# [1] 1
coalesce(NA, NA, 3)
# [1] 3
r2evans
  • 141,215
  • 6
  • 77
  • 149