0

Problems with conditional merging of data in R using a loop

I have a main dataset of 48134 unique observations with a total of 35 variables (a surgical population 2015-2020). One is LopNr which is a unique case identifier, two others are OPERATION_START (POSIXct %d%B%Y:%H:%M:%S) and YearOfSurgery (character) which both include the year a patient underwent surgery.

I now want to include additional socioeconomic data i have stored as six separate csv files, one for each inclusion year. I want to add SES data for each case based on the year it underwent surgery. If surgery in 2015, extract data from the 2015 csv file, for example. I also want the variables i add to be regarded as one, even though var1 can be extracted from any of the six csv files depending on year of surgery.

I'm using tidyverse and the last loop i tried was as follows (Raks_SummaInk is one of the variables i want to extract):

#TEST OF LOOP 230523 08:24

years <- c(2015, 2016, 2017, 2018, 2019, 2020)

HIP2_SPOR_SES <- HIP2_SPOR  # Create a new dataset to store the merged data (basically a copy of HIP2_SPOR)

for (year in years) {
  csv_file <- paste0("MC_Lev_LISA_", year, ".csv")
  
  socioeco_data <- read_csv(csv_file) %>%
    select(Lopnr, Raks_SummaInk)
  
  merged_data <- HIP2_SPOR_SES %>%
    filter(YearOfSurgery == as.character(year)) %>%
    left_join(socioeco_data, by = c("LopNr" = "Lopnr")) %>%
    summarize(Raks_SummaInk = sum(Raks_SummaInk, na.rm = TRUE))
  
  HIP2_SPOR_SES <- merge(HIP2_SPOR_SES, merged_data, all = TRUE)  # Merge the new data with the existing dataset
  
}

This resulted in a new dataset HIP2_SPOR_SES with 45 variables, all the new ones named Raks_SummaInk but with different suffixes, and unfortunately just NAs...

zx8754
  • 52,746
  • 12
  • 114
  • 209

1 Answers1

0

one approach, taking advantage of Mapping and Reduceing:

library(dplyr)
  • create example surgic. data:
main_data <- data.frame(LopNr = paste0('ID_', 1:2),
                        OPERATION_START = as.Date(c('2015/04/08 10:00:00', '2016/02/16 11:00:00')),
                        YearOfSurgery = as.character(2015:2016)
                        )
# > main_data
  LopNr OPERATION_START YearOfSurgery
1  ID_1      2015-04-08          2015
2  ID_2      2016-02-16          2016
  • set path to the folder with socio-economic files:
data_dir <- 'csv_dir' ## (replace with path to your soceco csv folder)
  • create some example csv files with socioeconomic data:
read.table(text = 'RaksSummaInk,Lopnr\n123,ID_1\n516,ID_2\n', sep = ',', header = TRUE) |>
  write.csv(file = 'csv_dir/MC_Lev_LISA_2015.csv') 
read.table(text = 'RaksSummaInk,Lopnr\n725,ID_1\n400,ID_2\n', sep = ',', header = TRUE) |>
  write.csv(file = 'csv_dir/MC_Lev_LISA_2016.csv') 
  • read in, row-bind and transform (rename, make unique) the soceco files:
soceco_data <- 
  c(2015, 2016) |>
  Map(f = \(yr) file.path(data_dir, sprintf('MC_Lev_LISA_%s.csv', yr)) |>
                read.csv() |>
                mutate(year = yr)
      ) |>
  Reduce(f = bind_rows) |>
  distinct(LopNr = Lopnr, ## match captialisation in main_data
           RaksSummaInk,
           YearOfSurgery = as.character(year))
> soceco_data
  RaksSummaInk LopNr YearOfSurgery
1          123  ID_1          2015
2          516  ID_2          2015
3          725  ID_1          2016
4          400  ID_2          2016
  • join soceco data to main data:
main_data |>
  left_join(soceco_data, c('LopNr', 'YearOfSurgery'))
  LopNr OPERATION_START YearOfSurgery RaksSummaInk
1  ID_1      2015-04-08          2015          123
2  ID_2      2016-02-16          2016          400
I_O
  • 4,983
  • 2
  • 2
  • 15
  • 1
    This worked. So thankful for your help! Had some trouble reading the csv files (probably due to some odd Swedish letters included), but could solve this by using `read_csv` instead. Happy coding and thanks again for the assistance, this seems like a great community to be part of. – Rasmus Åhman May 25 '23 at 05:52
  • You're welcome :-) Find more about the "character encoding hell" here: https://stackoverflow.com/questions/18789330/r-on-windows-character-encoding-hell/ – I_O May 25 '23 at 05:55