1

Assume I have df1:

Start_Date    End_Date     Value
2001-01-01    2001-12-31   1
2002-01-01    2002-12-31   2
2003-01-01    2003-12-31   3
2004-01-01    2004-12-31   4
2005-01-01    2005-12-31   5 

& df2:

DateTime      Gain   People
2003-01-01    3      3
2003-05-09    5      4
2004-12-31    1      2
2005-01-31    -2     2
2005-08-13    9      7
2006-09-10    6      8
2007-10-03    7      5

What I would like to do is to import Value from df1 into df2 by checking which DateTime in df2 falls in between Start_Date & End_Date in df1. If dates do not fall into any of periods in df1, return Value 0. A view of desired outcome:

DateTime      Gain   People   Value
2003-01-01    3      3        3
2003-05-09    5      4        3
2004-12-31    1      2        4
2005-01-31    -2     2        5
2005-08-13    9      7        5
2006-09-10    6      8        0
2007-10-03    7      5        0

Please advise

Agnes Lee
  • 322
  • 1
  • 12
  • 1
    Using data.table: `library(data.table) ; setDT(df2)[setDT(df), Value := i.Value, on = .(DateTime >= Start_Date, DateTime <= End_Date)]` – David Arenburg Jun 02 '20 at 10:28

2 Answers2

0

The underlying challenge is joining on inequality conditions. These are straightforward in sql, but not in R as dplyr only joins on equality. Here is the a link to the primary answer on S.O.

The best way I have found is to do a more general join and then filter for the inequalities.

df1 = df1 %>% mutate(join_id = 1)
df2 = df2 %>% mutate(join_id = 1)

output = left_join(df2, df1, by = 'join_id') %>%
  filter(Start_Date <= DateTime,
         DateTime <= End_Date) %>%
  select(DateTime, Gain, People, Value) %>%
  mutate(Value = ifelse(is.na(Value), 0, Value)

Explanation:

  • As your dataframes do not have an existing ID column we first create one. This is unnecessary if you want to join on some equality constraints and some inequality constraints.
  • We use a (much) more general join followed by a filter for the inequality constraints.
  • ifelse(is.na(... is used to replace the missing values.

In some cases this much more general join can create performance issues. But as R uses lazy-evaluation by default, if you filter immediately after the join then R should run both statements as part of the same join and avoid any performance problem.

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41
0

For some reason my previous answer was deleted by a moderator which means I can't undelete it - anyway answer reproduced below with a solution taken from one of the links.

You might want to do this with a left range join

Merge 2 dataframes if value within range

Join tables by date range

matching time a time in the interval between a start and end time

These three SO posts should give you a good starting place

EDIT: I think using sqldf is the easiest (clearest syntax anyway) from the 3 links above.

library(sqldf)
library(lubridate)
df1 <- data.frame(Start_Date=c(ymd("2001-01-01"),
                               ymd("2002-01-01"),
                               ymd("2003-01-01"),
                               ymd("2004-01-01"),
                                ymd("2005-01-01")),
                    End_Date=c(ymd("2001-12-31"),
                               ymd("2002-12-31"),
                               ymd("2003-12-31"),
                               ymd("2004-12-31"),
                               ymd("2005-12-31")),
                    Value=c(1,2,3,4,5))

df2 <- data.frame(DateTime=c(ymd("2003-01-01"),
                             ymd("2003-05-09"),
                             ymd("2004-12-31"),
                             ymd("2005-01-31"),
                             ymd("2005-08-13"),
                             ymd("2006-09-10"),
                             ymd("2007-10-03")),
                  Gain=c(3,5,1,-2,9,6,7),
                  People=c(3,4,2,2,7,8,5))

sqldf("SELECT DateTime,Gain,People,COALESCE(Value,0) AS Value FROM df2
            LEFT JOIN df1
            ON df2.DateTime BETWEEN df1.Start_Date AND df1.End_Date")

Coalesce changes NA (non matching) values to 0. The rest is pretty self explanatory.

Greg
  • 209
  • 1
  • 3