0

I have two dataframes that I am trying to merge. One dataframe has a list of activities and start times of the activity. Another dataframe has a running clock with some minute by minute data. For example

df1

ID   ACTIVITY   Timestamp
1    Act1       2014-11-10 11:56:00
1    Act2       2014-11-10 11:59:00
1    Act3       2014-11-10 12:02:00
1    Act4       2014-11-10 12:06:00
...

df2

ID   Timestamp
1    2014-11-10 11:56:00
1    2014-11-10 11:57:00
1    2014-11-10 11:58:00
1    2014-11-10 11:59:00
1    2014-11-10 12:00:00
1    2014-11-10 12:01:00
1    2014-11-10 12:02:00
...

I want to merge these two so the ACTIVITY from df1 is applied to the matching timestamp AND all succeeding minutes until the next ACTIVITY timestamp

I have tried the following code but it only applies the activity to the matching timestamp

Im hoping for something like this

ID   Timestamp              ACTIVITY
1    2014-11-10 11:56:00    Act1
1    2014-11-10 11:57:00    Act1
1    2014-11-10 11:58:00    Act1
1    2014-11-10 11:59:00    Act2
1    2014-11-10 12:00:00    Act2
1    2014-11-10 12:01:00    Act2
1    2014-11-10 12:02:00    Act3
...

But I am getting this from the code I currently have

merge(df1, df2, by = c("ID", "Timestamp"), copy = TRUE, all.y = T)

ID   Timestamp              ACTIVITY
1    2014-11-10 11:56:00    Act1
1    2014-11-10 11:57:00    NA
1    2014-11-10 11:58:00    NA
1    2014-11-10 11:59:00    Act2
1    2014-11-10 12:00:00    NA
1    2014-11-10 12:01:00    NA
1    2014-11-10 12:02:00    Act3
...
Andy
  • 65
  • 1
  • 6
  • I think this old post might help you get from what you currently have to what you want. https://stackoverflow.com/questions/7735647/replacing-nas-with-latest-non-na-value – Zelos Zhu Apr 19 '19 at 23:44

1 Answers1

0

Try this

merge(df1, df2, by = c("ID", "Timestamp"), copy = TRUE, all.y = T) %>%
 mutate(ACTIVITY2 = case_when(is.na(ACTIVITY) ~ lag(ACTIVITY),
                              TRUE ~ ACTIVITY))
jsv
  • 740
  • 3
  • 5