2

I have to dfs (dfA and dfB) that contain dates and I want to populate some columns in dfA with data from dfB based in some simple opreations.

Say df A has the following structure:

Location Mass  Date

A        0.18  10/05/2001 

B        0.25  15/08/2006

C        0.50  17/12/2019

Df B contains

Date  Event  Time

Where date has a wide range of dates. I would like to look in dfB for the dates in dfA and retrieve "Event" and "Time" data from dfB based in simple date operations, such getting data from one, two or three days from that showing in "Date" on dfA, giving me something like:

Location Mass  Date        Event 1 Event 2 Event 3 

A        0.18  10/05/2001   (w)    (x)     (y)

B        0.25  15/08/2006   (z)    (z1)    (z2)

Where (w) would be the data extracted from "Event" in dfB on "Date" (-1) day from "Date" specified in dfA (09/05/2001), then (x) would retrieve the data from "Event" in dfB on "Date" (-2) days from that in df A (08/05/2001) and so on.

I believe using dplyr and lubridate could sort this out.

Artem
  • 3,304
  • 3
  • 18
  • 41
RGR_288
  • 51
  • 6
  • 1
    Can you please provide a minimal reproducible example? In absence of that, I can just suggest you to check that the Date column in both the df have same class and same format. After that, you can use dplyr's inner_join function to do the work. – Eva Nov 14 '21 at 12:35
  • 1
    (1) As Eva stated, please make a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with a sample input and your expected output. (2) This sounds like a merge/join problem combined with a from long to wide operation. Take a look at [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) and [How to reshape data from long to wide format](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format). – Martin Gal Nov 14 '21 at 14:19

1 Answers1

0

You can add dummy variables with lagged dates (day - 1, day - 2 etc.) then use a series of left_join to achieve intended results. Please see the code below:

library(lubridate)
library(tidyverse)

# Simulation
dfa <- tibble(location = LETTERS[1:4],
              mass = c(0.18, 0.25, 0.5, 1),
              date = dmy(c("10/05/2001", "15/08/2006", "15/07/2006", "17/12/2019")))

dfb <- tibble(date = dmy(c("9/05/2001", "13/08/2006", "13/07/2006", "14/12/2019")),
              event = c("day-1a", "day-2a", "day-2b", "day-3"))

# Dplyr-ing, series of left_joins
dfc <- dfa %>%
  mutate(date_1 = date - 1,
         date_2 = date - 2,
         date_3 = date - 3) %>%
  left_join(dfb, by = c("date_1" = "date")) %>%
  rename(event1 = event) %>%
  left_join(dfb, by = c("date_2" = "date")) %>%
  rename(event2 = event) %>%
  left_join(dfb, by = c("date_3" = "date")) %>%
  rename(event3 = event) %>%
  select(-starts_with("date_"))
  
dfc

Output:

# A tibble: 4 x 6
  location  mass date       event1 event2 event3
  <chr>    <dbl> <date>     <chr>  <chr>  <chr> 
1 A         0.18 2001-05-10 day-1a NA     NA    
2 B         0.25 2006-08-15 NA     day-2a NA    
3 C         0.5  2006-07-15 NA     day-2b NA    
4 D         1    2019-12-17 NA     NA     day-3 
Artem
  • 3,304
  • 3
  • 18
  • 41