I’m fairly new to R. I’ve been searching for multiple hours and asking colleagues who have more experience with R for help , but still haven’t found a solution. I did find a related thread here that provided some inspiration, but I couldn't figure out how to extend the approach to solve my issue.
Overview
I have a dataset with 1800+ variables. The issue is that the dataset contains two sets of 140 variables with different data that were created from when the same survey went out at two different timepoints by accident based upon an automation error. The database created two versions of the same variable. I’m trying to index a unique column to check if the value for that row is “NA”, if it is then I want to fill it with values from the second set of 140 columns and overwrite the values in the original 140 columns. If the row isn’t “NA”, then I want it to leave it as is.
Dummy Data Set
Based on the below dummy set, what I’m attempting to do is check “week0date_v2” to see if it’s a NA value for a given row. If yes, then grab the values in that row from the columns from “week0date” to “var3” and write these values to same row in the columns from “week0date_v2” to “var3_v2”. If the value is not NA for a given row, then leave the original data.
library('tidyverse')
dummy_data<- data.frame(
id = c(1,2,3,4,5,6),
week0date_v2 = c("2022/2/1", NA, "2022/2/15", "2022/2/18", "2022/2/15", NA),
var1_v2 = c(75,NA,66,NA,65,NA),
var2_v2 = c(19,NA,67,10,54,NA),
var3_v2 = c(NA,NA,99,125,73,NA),
verify_v2 = c(1,NA,1,1,1,NA),
week0date = c(NA, "2022/3/2", "2022/4/15", "2022/4/18", NA, "2022/4/15"),
var1 = c(NA,52,NA,63,NA,75),
var2 = c(NA,78,NA,NA,NA,63),
var3 = c(15,64,25,NA,20,13),
verify = c(NA,1,NA,1,NA,1)
)
dummy_data <- dummy_data %>% mutate_if(is.character, as.Date)
id week0date_v2 var1_v2 var2_v2 var3_v2 verify_v2 week0date var1 var2 var3 verify
1 1 2022-02-01 75 19 NA 1 <NA> NA NA 15 NA
2 2 <NA> NA NA NA NA 2022-03-02 52 78 64 1
3 3 2022-02-15 66 67 99 1 2022-04-15 NA NA 25 NA
4 4 2022-02-18 NA 10 125 1 2022-04-18 63 NA NA 1
5 5 2022-02-15 65 54 73 1 <NA> NA NA 20 NA
6 6 <NA> NA NA NA NA 2022-04-15 75 63 13 1
Current Thinking
I’ve tried multiple approaches and haven’t been successful.
My colleague and I first tried to get it to work for a single variable (just the “week0date_v2” column), which we were able to do using mutate() and case_when() approach below:
dummy_data_first_attempt <- dummy_data %>%
mutate(
week0date_v2 = case_when(
!is.na(week0date_v2) ~ week0date_v2,
is.na(week0date_v2) ~ week0date
))
I don’t know how to extend the above approach to apply across the span of columns “week0date_v2” to “var3_v2." I tried writing a function to do this, but my skills with writing a function weren’t sufficient.
Any help with how to extend what the above to achieve the below desired outcome would be so greatly appreciated !!!
Desired Outcome
This is what I'm hoping to end up with (i.e., the values from “week0date” to “var3” are copied and pasted into “week0date_v2” to “var3_v2” when a given row in "week0date_v2" is NA ... else leave the original data)
dummy_data_wrangled <- data.frame(
id = c(1,2,3,4,5,6),
week0date_v2 = c("2022/2/1", "2022/3/2", "2022/2/15", "2022/2/18", "2022/2/15", "2022/4/15"),
var1_v2 = c(75,52,66,NA,65,75),
var2_v2 = c(19,78,67,10,54,63),
var3_v2 = c(NA,64,99,125,73,13),
verify_v2 = c(1,NA,1,1,1,NA),
week0date = c(NA, "2022/3/2", "2022/4/15", "2022/4/18", NA, "2022/4/15"),
var1 = c(NA,52,NA,63,NA,75),
var2 = c(NA,78,NA,NA,NA,63),
var3 = c(15,64,25,NA,20,13),
verify = c(NA,1,NA,1,NA,1)
)
dummy_data_wrangled
id week0date_v2 var1_v2 var2_v2 var3_v2 verify_v2 week0date var1 var2 var3 verify
1 1 2022/2/1 75 19 NA 1 <NA> NA NA 15 NA
2 2 2022/3/2 52 78 64 NA 2022/3/2 52 78 64 1
3 3 2022/2/15 66 67 99 1 2022/4/15 NA NA 25 NA
4 4 2022/2/18 NA 10 125 1 2022/4/18 63 NA NA 1
5 5 2022/2/15 65 54 73 1 <NA> NA NA 20 NA
6 6 2022/4/15 75 63 13 NA 2022/4/15 75 63 13 1