0

I have several data frames that contain information on temporarily ordered cases and one supplementary variable. I would like to merge the data frames in order to obtain a single frame in which cases are still ordered chronologically and associated to the correct variable value.

A minimal working example would be:

Cases <- c("Flanders", "Flanders", "Wallonia", "Wallonia")
From <- c(1830, 1900, 1830, 1950)
To   <- c(1899, 2020, 1949, 2020)
Variable1 <- c(0, 1, 0, 1)
Variable1.df <- data.frame(Cases, From, To, Variable1)

Cases <- c("Flanders", "Flanders", "Flanders", "Wallonia", "Wallonia", "Wallonia")
From <- c(1830, 1871, 1946, 1830, 1931, 1976)
To   <- c(1870, 1945, 2020, 1930, 1975, 2020)
Variable2 <- c(0, 1, 0, 0, 0, 1)
Variable2.df <- data.frame(Cases, From, To, Variable2)

Cases <- c("Flanders", "Flanders", "Flanders", "Flanders", "Flanders", "Wallonia", "Wallonia", "Wallonia", "Wallonia", "Wallonia")
From <- c(1830, 1850, 1900, 1950, 2000, 1830, 1850, 1900, 1950, 2010)
To   <- c(1849, 1899, 1949, 1999, 2020, 1849, 1899, 1949, 2009, 2020)
Variable3 <- c(0, 1, 0, 0, 1, 0, 1, 0, 0, 1)
Variable3.df <- data.frame(Cases, From, To, Variable3)

The output I am looking for would then be:

Cases     From  To    Variable1  Variable2  Variable3
Flanders  1830  1849  0          0          0
Flanders  1850  1870  0          0          1
Flanders  1871  1899  0          1          1
Flanders  1900  1945  1          1          0
Flanders  1946  1999  1          0          0
Flanders  2000  2020  1          0          1
...
Wallonia  2010  2020  1          1          1

Since I have to repeat the operation several times with a changing number of data frames to merge, I would like to create a function. Could someone help me with this? My sincere thanks in advance.

CNiessen
  • 89
  • 6
  • 1
    Do you need `Reduce(function(x, y) merge(x, y, all = TRUE), list(Variable1.df, Variable2.df, Variable3.df))` ? – Ronak Shah Nov 20 '20 at 15:08
  • Thanks a lot for helping. This goes in the right direction too, but when I run the code, the output does not correspond to the one I sought (see above) either: there are multiple Flanders 1830-xxxx and rows were not matched (for each row, two variables contains NAs. – CNiessen Nov 20 '20 at 15:32

2 Answers2

0

Maybe are you looking for this:

library(dplyr)
#Extract data into a list
v1 <- ls(pattern = '\\.df')
#Store in a list
List <- mget(v1)
#Merge
Merged <- Reduce(function(x, y) full_join(x, y),List) %>% replace(is.na(.),0)
Duck
  • 39,058
  • 13
  • 42
  • 84
  • Thanks a lot for helping! This goes in the right direction. Unfortunately, when I run the code with my mwe, I do not get the output I was looking for (see above). The data are not sorted by cases and the disaggregation does not seem to work correctly: I obtain different Flanders 1830-xxxx, for example. – CNiessen Nov 20 '20 at 15:27
  • @CNiessen Please try this code `Merged <- Reduce(function(x, y) full_join(x, y),List) %>% replace(is.na(.),0) %>% arrange(Cases)` and let me know if that works! – Duck Nov 20 '20 at 15:30
  • Many thanks! Cases are now sorted correctly, thanks a lot. However, for each of them there are still multiple periods that should be aggregated. E.g. Flanders 1830-1899/1900-2020 and Flanders 1830-1870/1871-1945/1946-2020 should be aggregated to Flanders 1830-1870/1871-1899/1900-1945/1946-2020. – CNiessen Nov 20 '20 at 15:40
  • @CNiessen I dont understand what you mean by aggregating maybe this? `Merged <- Reduce(function(x, y) full_join(x, y),List) %>% replace(is.na(.),0) %>% arrange(Cases,From,To)` – Duck Nov 20 '20 at 15:44
  • By aggregating, I mean that I do not want the same time period for the same case to be included twice. A shorter example (with periods that differ from my mwe to make it more obvious): if I had two rows Flanders 1830-1840 and Flanders 1830-1850, they should become Flanders 1830-1840 and Flanders 1841-1850. – CNiessen Nov 20 '20 at 16:04
0

We can use tidyverse methods

library(dplyr)
library(purrr)
library(tidyr)
mget(ls(pattern = "\\.df")) %>%
    reduce(full_join) %>% 
    mutate(across(everything(), replace_na, 0))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks a lot for helping! Unfortunately, the produced results is identical to the one of @Duck 's first suggestion. When I run the code with my mwe, I do not get the output I was looking for (see above). The data are not sorted by cases and the disaggregation does not seem to work correctly: I obtain different Flanders 1830-xxxx, for example. – CNiessen Nov 21 '20 at 22:18