Guys I need to merge different data frames from a list by row and maintain some information contained in the duplicate rows. Each row contains daily observation of some variables (stock prices) and each of the data frames contains different time spans (years). From one data frame to the other some variables could change (columns - stocks inside the index). bind_rows
from dplyr seems to do a great job at simply adding columns with the new variables and leaving NA
s elsewhere.
The point is that some of the data frames contain the last day of the previous period (that is therefore already bind from the previous data frame) but they slightly differ in the variables shown (columns). I don't want to completely eliminate one of the duplicate rows because they both contain information I need and I would rather prefer to merge them. The duplicate rows contain either the same value (because refer to the same day) or one NA and one value (because refer to the different variables in the set). How can I do?
The problem could be synthetized in the following example:
library(dplyr)
df_1 <- data.frame(Date=c(1:4),A=c(20,30,20,30),B=c(15,16,15,16))
df_2 <- data.frame(Date=c(4:7),A=c(30,35,60,40),C=c(15,18,25,20))
dfs<-list(df_1,df_2)
bind_rows(dfs)
Outcome:
Date A B C
1 1 20 15 NA
2 2 30 16 NA
3 3 20 15 NA
4 4 30 16 NA
5 4 30 NA 15
6 5 35 NA 18
7 6 60 NA 25
8 7 40 NA 20
Desired outcome:
Date A B C
1 1 20 15 NA
2 2 30 16 NA
3 3 20 15 NA
4 4 30 16 15
5 5 35 NA 18
6 6 60 NA 25
7 7 40 NA 20