0

I am trying to join two dataframes together by their dates. The complicating factor is the two dataframes look slightly different. I'll use the sample data from a previous post:

> eventdates
# A tibble: 2 × 4
  event.no dr.rank   dr.start     dr.end
     <int>   <int>     <date>     <date>
1        1      14 1964-09-30 1964-10-06
2        2      16 1964-11-01 1964-12-24
> ts1964 <- data_frame(DATE = seq(from = as.Date("1964-01-01"), 
+                                 to = as.Date("1964-12-31"), 
+                                 by = "days"),
+                      Q = 1:366)
> 

I was planning to use lapply to create a list that lets me expand out the data from eventdates:

lapply(split(eventdates, seq(nrow(eventdates))), 
       function(x) { 
         filter(ts1964, DATE >= x$dr.start & DATE <= x$dr.end) })

This works in expanding out the dates from eventdates, and getting the column names correct. However, I have realised that this does not retain the event.no grouping variable, nor does it successfully unlist into a dataframe, and melt does not seem to work either.

My question is, how can I join these two dataframes together? Essentially, I require the ts1964 dataframe to have an event.no column (where there is no event, the event.no can be zero or NA etc.)

#

A slice of the expected output should look something like this:

> output <-
+   ts1964 %>%
+   mutate(event.no = 0)
> output$event.no[274:280] <- 1
> output$event.no[306:359] <- 2
> output %>%
+   slice(270:290)
# A tibble: 21 × 3
         DATE     Q event.no
       <date> <int>    <dbl>
1  1964-09-26   270        0
2  1964-09-27   271        0
3  1964-09-28   272        0
4  1964-09-29   273        0
5  1964-09-30   274        1
6  1964-10-01   275        1
7  1964-10-02   276        1
8  1964-10-03   277        1
9  1964-10-04   278        1
10 1964-10-05   279        1
# ... with 11 more rows
> 
Quinn
  • 419
  • 1
  • 5
  • 21
  • Have you tried `?merge`? – Oliver Frost Jan 17 '17 at 10:20
  • Please show expected output – Sotos Jan 17 '17 at 10:21
  • @Oliver `merge` won't work until the dates in `eventdates`has been expanded out – Quinn Jan 17 '17 at 10:37
  • @Sotos I've added a slice of what I am trying to make the data look like – Quinn Jan 17 '17 at 10:37
  • This is something for `data.table` `foverlaps` merges. Have a look at http://stackoverflow.com/questions/24480031/roll-join-with-start-end-window – Eric Lecoutre Jan 17 '17 at 10:47
  • 3
    @akrun please find a better dupe as this question cannot be solved with the answers in the target – Jaap Jan 17 '17 at 11:22
  • @Jaap This looks like a general dupe. If you are rooting for exactness, the one you duped [here](http://stackoverflow.com/questions/41694902/r-data-table-extract-value-from-alternative-columns-based-on-value-from-anothe) is not exact as well – akrun Jan 17 '17 at 11:40

2 Answers2

4

Picking up from your resulting list,

l1 <- lapply(split(eventdates, seq(nrow(eventdates))), 
              function(x) { 
                  filter(ts1964, DATE >= x$dr.start & DATE <= x$dr.end) })

do.call(rbind, Map(cbind, lapply(split(eventdates, seq(nrow(eventdates))), '[', 1), l1))

#     event.no       DATE   Q
#1.1         1 1964-09-30 274
#1.2         1 1964-10-01 275
#1.3         1 1964-10-02 276
#1.4         1 1964-10-03 277
#1.5         1 1964-10-04 278
#1.6         1 1964-10-05 279
#1.7         1 1964-10-06 280
#2.1         2 1964-11-01 306
#2.2         2 1964-11-02 307
#2.3         ...
Sotos
  • 51,121
  • 6
  • 32
  • 66
4

You could use the data.table-package as follows:

library(data.table)
# convert ts1964 to a 'data.table
setDT(ts1964)
# create a new 'data.table' with event dates in long form
ev.dates.2 <- setDT(eventdates)[, .(DATE = seq(dr.start,dr.end,'day')), by = .(event.no, dr.rank)]

# join with ts1964
ts1964[ev.dates.2, on = 'DATE', event := event.no]

If you want to replace the NA's with zero's, you can replace the last line with:

ts1964[ev.dates.2, on = 'DATE', event := event.no][is.na(event), event := 0]

Everything in one go:

setDT(ts1964)[setDT(eventdates)[, .(DATE = seq(dr.start,dr.end,'day')), by = .(event.no, dr.rank)], 
              on = 'DATE', event := event.no
              ][is.na(event), event := 0]

Used data:

ts1964 <- data.frame(DATE = seq(from = as.Date("1964-01-01"), to = as.Date("1964-12-31"), by = "days"), Q = 1:366)

eventdates <- structure(list(event.no = 1:2, dr.rank = c(14L, 16L), 
                             dr.start = structure(c(-1919, -1887), class = "Date"), 
                             dr.end = structure(c(-1913, -1834), class = "Date")), 
                        .Names = c("event.no", "dr.rank", "dr.start", "dr.end"), row.names = c(NA, -2L), class = "data.frame")
Jaap
  • 81,064
  • 34
  • 182
  • 193