I could use some help with a tidyverse solution to this question.
I'm working with a large dataset that has 20+ binary cancer outcomes (cancer_{cancertype}), as well as corresponding ages ({cancertype}_age). Some individuals are missing cancer phenotype information - I would like to set the age variables for each cancer type to NA if the cancer phenotype is missing. I've been trying to implement mutate(across()), but am having some issues specifying the appropriate arguments.
# load tidyverse lib
library(tidyverse)
# Set seed for reproducibility
set.seed(42)
# generate dataframe
cancer_ds <- data.frame(id = 1000:1009,
cancer_a = rep(0:1, length = 10),
cancer_b = c(rep(0, 3), NA, NA, 1, NA, rep(1, 3)),
cancer_c = c(rep(0:1, each = 2, len = 6), rep(NA, 4)),
a_age = sample(30:60, 10, FALSE),
b_age = sample(30:60, 10, FALSE),
c_age = sample(30:60, 10, FALSE)
)
cancer_ds
cancer_list <- paste("cancer",letters[seq(1:3)], sep = "_" )
cancer_list
# attempted code
out_ds <- cancer_ds %>%
mutate(across(ends_with("age"), ~replace(is.na(cancer_list)))
# expected output dataset
out_ds_exp <- cancer_ds %>%
mutate(b_age = ifelse(b_age %in% c("43", "49", "47"), NA, b_age),
c_age = ifelse(c_age %in% c("49", "31", "37", "32"), NA, c_age))
out_ds_exp
Any help is appreciated! Thanks.