0

Let's say I have two dataframes like the ones below:

df1 = structure(list(Date = c("2000-01-05", "2000-02-03", "2000-03-02", 
"2000-03-30", "2000-04-13", "2000-05-11", "2000-06-08", "2000-07-06", 
"2000-09-14", "2000-10-19", "2000-11-02", "2000-12-14", "2001-02-01", 
"2001-03-01", "2001-04-11", "2001-05-10", "2001-06-07", "2001-06-21", 
"2001-07-05", "2001-08-30", "2001-10-11", "2001-11-08", "2001-12-06"
)), row.names = c(NA, 23L), class = "data.frame")

         Date
1  2000-01-05
2  2000-02-03
3  2000-03-02
4  2000-03-30
5  2000-04-13
6  2000-05-11
7  2000-06-08
8  2000-07-06
9  2000-09-14
10 2000-10-19
11 2000-11-02
12 2000-12-14
13 2001-02-01
14 2001-03-01
15 2001-04-11
16 2001-05-10
17 2001-06-07
18 2001-06-21
19 2001-07-05
20 2001-08-30
21 2001-10-11
22 2001-11-08
23 2001-12-06


df2 = structure(list(Date = structure(c(10987, 11016, 11047, 11077, 
11108, 11138, 11169, 11200, 11230, 11261, 11291, 11322, 11353, 
11381, 11412, 11442, 11473, 11503, 11534, 11565, 11595, 11626, 
11656, 11687), class = "Date"), x = c(3.04285714285714, 3.27571428571429, 
3.5104347826087, 3.685, 3.92, 4.29454545454545, 4.30857142857143, 
4.41913043478261, 4.59047619047619, 4.76272727272727, 4.82909090909091, 
4.82684210526316, 4.75590909090909, 4.9925, 4.78136363636364, 
5.06421052631579, 4.65363636363636, 4.53952380952381, 4.50545454545454, 
4.49130434782609, 3.9865, 3.97130434782609, 3.50727272727273, 
3.33888888888889)), row.names = c(NA, 24L), class = "data.frame")

         Date        x
1  2000-01-31 3.042857
2  2000-02-29 3.275714
3  2000-03-31 3.510435
4  2000-04-30 3.685000
5  2000-05-31 3.920000
6  2000-06-30 4.294545
7  2000-07-31 4.308571
8  2000-08-31 4.419130
9  2000-09-30 4.590476
10 2000-10-31 4.762727
11 2000-11-30 4.829091
12 2000-12-31 4.826842
13 2001-01-31 4.755909
14 2001-02-28 4.992500
15 2001-03-31 4.781364
16 2001-04-30 5.064211
17 2001-05-31 4.653636
18 2001-06-30 4.539524
19 2001-07-31 4.505455
20 2001-08-31 4.491304
21 2001-09-30 3.986500
22 2001-10-31 3.971304
23 2001-11-30 3.507273
24 2001-12-31 3.338889

Now, what I would like to do is to create a real-time dataframe, that is, the data in df2 that were only available at the time of df1. For instance, at 2000-01-05 (first row in df1) no data in df2 was available since since 2000-01-31 (first row of df2) occurs after 2000-01-05. However, in 2000-02-03(second row in df1) the observation in 2000-01-31 (first row of df2) is available. This should be the reasoning for every row. The outcome should look like this:

         Date    y
1  2000-01-05   NA 
2  2000-02-03  3.042857
3  2000-03-02  3.275714
4  2000-03-30  3.275714
5  2000-04-13  3.510435
6  2000-05-11  3.685000
....

The rule would be: pick up from df2 only the observation that was available at the time of df1.

Can anyone help me?

Thanks!

Rollo99
  • 1,601
  • 7
  • 15
  • Does this answer your question? [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – ekoam Jan 17 '22 at 21:13
  • @ekoam no this is a different problem. notice the dates aren't the same in each dataframe. –  Jan 17 '22 at 21:15
  • Hello, @ekoam, not really because in my question I am not trying to match the dates. It's more subtle my issue. – Rollo99 Jan 17 '22 at 21:15
  • I still consider this a duplicate. Please search for **non-equi join** following the link I provided. @Rollo99 – ekoam Jan 17 '22 at 21:19
  • That's a rolling join or roll join if you need a keyword, data.table does those – moodymudskipper Jan 19 '22 at 19:51

1 Answers1

1

What you can do is complete the df2 dates and then join.

library(dplyr)
library(tidyr)

# create a dataframe with all the days, not just the snapshots
df2_complete <- df2 %>% 
  complete(Date = seq.Date(min(Date), max(Date), by = "day")) %>% 
  fill(x, .direction = "down")

# convert to Date class for this case and join
df1 %>% 
  mutate(Date = as.Date(Date)) %>% 
  left_join(df2_complete, by = "Date")

Which gives:

         Date        x
1  2000-01-05       NA
2  2000-02-03 3.042857
3  2000-03-02 3.275714
4  2000-03-30 3.275714
5  2000-04-13 3.510435
6  2000-05-11 3.685000
....
  • 1
    thanks Adam! clever solution, it didn't come to my mind to fill the series at daily level and then left_join! Thanks a lot! – Rollo99 Jan 17 '22 at 21:22