I have three dataframes (d1, d2, d3), where ncol and nrow do not match across datasets.
I want to combine these datasets by common columns, while preserving all unique columns and rows.
I believe this means I need a full join, which I've attempted with the example code below.
However, when I try to join by keys common to any two or more datasets, I get an error message: it appears that the column common to d2 and d3 throws an error, even though other keys common to only two datasets are ok.
How should I join these three datasets?
d1 <- data.frame(Y = "Y1",
N=1:3,
C= c(1, 3, 2),
D= c(3, 1, 4),
E= c(0, 1, 1),
Q= c(2, 0, 3))
d2 <- data.frame(Y = "Y2",
N=1:3,
E= c(0, 1, 1),
H= c(3, 1, 4),
S= c(2, 2, 0))
d3 <- data.frame(Y = "Y3",
N=1:4,
C= c(4, 2, 1, 3),
E= c(3, 1, 4, 2),
H= c(1, 3, 2, 1),
U= c(3, 3, 1, 1))
join_all <- full_join(d1, d2, d3,
by = c("Y", "N", "C", "E", "H"))
#Error: Join columns must be present in data.
#x Problem with `H`.
Just for example's sake, this is what I want the join result to look like:
> desired_df <- data.frame(
+ Y= c("Y1", "Y1", "Y1", "Y2", "Y2", "Y2", "Y3", "Y3", "Y3", "Y3"),
+ N= c(1, 2, 3, 1, 2, 3, 1, 2, 3, 4),
+ C= c(1, 3, 2, "NA", "NA", "NA", 4, 2, 1, 3),
+ D= c(3, 1, 4, "NA", "NA", "NA", "NA", "NA", "NA", "NA"),
+ E= c(0, 1, 1, 0, 1, 1, 3, 1, 4, 2),
+ H= c("NA","NA","NA", 3, 1, 4, 1, 3, 2, 1),
+ Q= c(2, 0, 3, "NA", "NA", "NA", "NA", "NA", "NA", "NA"),
+ S= c("NA", "NA", "NA", 2, 2, 0, "NA", "NA", "NA", "NA"),
+ U= c("NA", "NA", "NA", "NA", "NA", "NA", 3, 3, 1, 1))
> desired_df
Y N C D E H Q S U
1 Y1 1 1 3 0 NA 2 NA NA
2 Y1 2 3 1 1 NA 0 NA NA
3 Y1 3 2 4 1 NA 3 NA NA
4 Y2 1 NA NA 0 3 NA 2 NA
5 Y2 2 NA NA 1 1 NA 2 NA
6 Y2 3 NA NA 1 4 NA 0 NA
7 Y3 1 4 NA 3 1 NA NA 3
8 Y3 2 2 NA 1 3 NA NA 3
9 Y3 3 1 NA 4 2 NA NA 1
10 Y3 4 3 NA 2 1 NA NA 1
Finally: once joined, how would I turn the NAs into 0?
Thanks for your help!