1

I have two data frames that I want to combine; however, I only want to keep one date. df1 will be the months from Jan 01, 2013 to Oct, 1, 2016. df2 will contain the frequency of occurrence of an event. If there was no event that month, df2 will not show a value.

df1 <- data.frame(date=seq(as.Date("2013-01-01"), as.Date("2016-10-01"), by="month"))

    df1
    date            Freq 
    1  2013-01-01    0
    2  2013-02-01    0
    3  2013-03-01    0
    4  2013-04-01    0
    5  2013-05-01    0
    ...

    df2
    date            Freq
    1  2013-03-01    1
    2  2013-08-01    2
    3  2014-04-01    5
    4  2014-05-01    2
    5  2014-06-01    5
    ...

I want the new data frame to look like the following.

    date            Freq 
    1  2013-01-01    0
    2  2013-02-01    0
    3  2013-03-01    1
    4  2013-04-01    0
    5  2013-05-01    0
    6  2013-06-01    0
    7  2013-07-01    0
    8  2013-08-01    2
    9  2013-09-01    0
    ...
MJ30
  • 13
  • 4

3 Answers3

0

You can merge with all.x=TRUE and then set the NA's resulting from the merge to zero:

out <- merge(df1,df2,all.x=TRUE)
out[is.na(out)] <- 0
head(out,10)
##         date Freq
##1  2013-01-01    0
##2  2013-02-01    0
##3  2013-03-01    1
##4  2013-04-01    0
##5  2013-05-01    0
##6  2013-06-01    0
##7  2013-07-01    0
##8  2013-08-01    2
##9  2013-09-01    0
##10 2013-10-01    0

Data: where df1 is created as in the OP:

df1 <- data.frame(date=seq(as.Date("2013-01-01"), as.Date("2016-10-01"), by="month"))

df1 <- structure(list(date = structure(c(15706, 15737, 15765, 15796, 
15826, 15857, 15887, 15918, 15949, 15979, 16010, 16040, 16071, 
16102, 16130, 16161, 16191, 16222, 16252, 16283, 16314, 16344, 
16375, 16405, 16436, 16467, 16495, 16526, 16556, 16587, 16617, 
16648, 16679, 16709, 16740, 16770, 16801, 16832, 16861, 16892, 
16922, 16953, 16983, 17014, 17045, 17075), class = "Date")), .Names = "date", row.names = c(NA, 
-46L), class = "data.frame")
##         date
##1  2013-01-01
##2  2013-02-01
##3  2013-03-01
##4  2013-04-01
##5  2013-05-01
## ...
##42 2016-06-01
##43 2016-07-01
##44 2016-08-01
##45 2016-09-01
##46 2016-10-01

df2 <- structure(list(date = structure(c(15765, 15918, 16161, 16191, 
16222), class = "Date"), Freq = c(1L, 2L, 5L, 2L, 5L)), .Names = c("date", 
"Freq"), row.names = c(NA, -5L), class = "data.frame")
##        date Freq
##1 2013-03-01    1
##2 2013-08-01    2
##3 2014-04-01    5
##4 2014-05-01    2
##5 2014-06-01    5
aichao
  • 7,375
  • 3
  • 16
  • 18
0

With dplyr for its joins,

library(dplyr)

full_join(df1, df2) %>% 
    group_by(date) %>% 
    summarise(Freq = sum(Freq))

## # A tibble: 9 × 2
##         date  Freq
##       <date> <int>
## 1 2013-01-01     0
## 2 2013-02-01     0
## 3 2013-03-01     1
## 4 2013-04-01     0
## 5 2013-05-01     0
## 6 2013-08-01     2
## 7 2014-04-01     5
## 8 2014-05-01     2
## 9 2014-06-01     5

or the base equivalent,

aggregate(Freq ~ date, merge(df1, df2, all = TRUE), sum)

##         date Freq
## 1 2013-01-01    0
## 2 2013-02-01    0
## 3 2013-03-01    1
## 4 2013-04-01    0
## 5 2013-05-01    0
## 6 2013-08-01    2
## 7 2014-04-01    5
## 8 2014-05-01    2
## 9 2014-06-01    5

Order after the fact if you like.

alistaire
  • 42,459
  • 4
  • 77
  • 117
0

There's the data.table way

library(data.table)
#Create the data
set.seed(1234)
df1 <- data.table(date=seq(as.Date("2013-01-01"), as.Date("2016-10-01"), by="month"))
df2 <- data.table(date=sample(df1$date, size= 10), freq=sample(1:10, 10, replace=T))

#Set keys
setkey(df1, date)
setkey(df2, date)

#data.table magic
df1[df2, freq := freq ]
df1[!df2, freq := 0 ]
df1

Result:

            date freq
 1: 2013-01-01    3
 2: 2013-02-01    0
 3: 2013-03-01    0
 4: 2013-04-01    0
 5: 2013-05-01    0
 6: 2013-06-01    7
 7: 2013-07-01    0
 ...
Serban Tanasa
  • 3,592
  • 2
  • 23
  • 45