I have a dataframe x which has two columns. Let's say they are A and B columns. A column are dates, B column are values that taken on that date. Both A and B column are continuous, no missing dates, and range from Jan to Oct in days. Right now I have another dataframe y which has column A and column C. A are also dates, C are values taken by another instrument on that specific date. However, A and C in this dataframe is not continuous (there are missing dates). For example, I have value on 1/1 but next value will be on 1/10. Now I would like to join values of C in dataframe y (discrete dates) to the first dataframe x which has continuos dates, according to the common A column, is there a efficient way to do this?
Asked
Active
Viewed 161 times
1 Answers
4
You could use merge
from base R
res1 <- merge(x,y, by="A", all.x=TRUE)
dim(res1)
#[1] 10 3
head(res1,3)
# A B C
#1 2011-04-03 1.3709584 0.2101654
#2 2011-04-04 -0.5646982 NA
#3 2011-04-05 0.3631284 NA
or left_join
from dplyr
(should be fast on big datasets)
library(dplyr)
res2 <- left_join(x,y, by="A") %>% #in case `date` column gets coerced to `numeric`
mutate(A= as.Date(A, origin='1970-01-01'))
all.equal(res1, as.data.frame(res2))
#[1] TRUE
Data
set.seed(42)
x <- data.frame(A=seq(as.Date("2011-04-03"), length.out=10, by=1), B=rnorm(10))
set.seed(384)
y <- data.frame(A=seq(as.Date("2011-04-03"), length.out=6, by=3), C=rnorm(6))

akrun
- 874,273
- 37
- 540
- 662
-
Thank you very much! It works! But instread of using `All.x=TRUE`, I used `ALL=TURE`. – Xin Qiao Oct 07 '14 at 19:20