Problems with conditional merging of data in R using a loop
I have a main dataset of 48134 unique observations with a total of 35 variables (a surgical population 2015-2020). One is LopNr which is a unique case identifier, two others are OPERATION_START (POSIXct %d%B%Y:%H:%M:%S) and YearOfSurgery (character) which both include the year a patient underwent surgery.
I now want to include additional socioeconomic data i have stored as six separate csv files, one for each inclusion year. I want to add SES data for each case based on the year it underwent surgery. If surgery in 2015, extract data from the 2015 csv file, for example. I also want the variables i add to be regarded as one, even though var1 can be extracted from any of the six csv files depending on year of surgery.
I'm using tidyverse and the last loop i tried was as follows (Raks_SummaInk is one of the variables i want to extract):
#TEST OF LOOP 230523 08:24
years <- c(2015, 2016, 2017, 2018, 2019, 2020)
HIP2_SPOR_SES <- HIP2_SPOR # Create a new dataset to store the merged data (basically a copy of HIP2_SPOR)
for (year in years) {
csv_file <- paste0("MC_Lev_LISA_", year, ".csv")
socioeco_data <- read_csv(csv_file) %>%
select(Lopnr, Raks_SummaInk)
merged_data <- HIP2_SPOR_SES %>%
filter(YearOfSurgery == as.character(year)) %>%
left_join(socioeco_data, by = c("LopNr" = "Lopnr")) %>%
summarize(Raks_SummaInk = sum(Raks_SummaInk, na.rm = TRUE))
HIP2_SPOR_SES <- merge(HIP2_SPOR_SES, merged_data, all = TRUE) # Merge the new data with the existing dataset
}
This resulted in a new dataset HIP2_SPOR_SES with 45 variables, all the new ones named Raks_SummaInk but with different suffixes, and unfortunately just NAs...