0

I have one large dataframe df1 with many observations including the date of the observation. There are multiple observations for each date. I also have another dataframe df2 which contains two variables, a date and a new variable I would like to add to df1, which we'll call VarC. We can assume there is only one observation per date in df2.

Here's some simple example code:

df1Date <- as.Date(c('2010-11-1', '2010-11-1', '2010-11-2', '2010-11-2', '2010-11-2', '2010-11-2'))
VarA <- c("Red", "Blue", "Green", "Yellow", "Orange", "Black")
VarB <- c(1, 2, 3, 4, 5, 6)
df1 <- data.frame(df1Date, VarA, VarB)

df2date <- as.Date(c('2010-11-1','2010-11-2'))
VarC <- c("Good Day", "Bad Day")
df2 <- data.frame(df2date, VarC)

I would like to find an efficient way to add a new Variable DayType in df1 which would be equal to a value selected from VarC matching the date in df1. In other words, I would like to go through each observation in df1, look up the date df1Date for the matching df2date in df2, and append the analagous value of VarC to my df1 dataframe under a new variable DayType.

I'm familiar with the dplyr::mutate function, but I don't know how to appropriately index into the dataframes to accomplish what I'm trying to do.

The new variable DayType should look like:

DayType <- c("Good Day", "Good Day", "Bad Day", "Bad Day", "Bad Day", "Bad Day")  
ds_guy
  • 143
  • 2
  • 5

3 Answers3

1

Just use left join() from thedplyr library

Method:

df3 <- df1 %>% left_join(df2, by = c("df1Date" = "df2date"))

Output:

df3
     df1Date   VarA VarB     VarC
1 2010-11-01    Red    1 Good Day
2 2010-11-01   Blue    2 Good Day
3 2010-11-02  Green    3  Bad Day
4 2010-11-02 Yellow    4  Bad Day
5 2010-11-02 Orange    5  Bad Day
6 2010-11-02  Black    6  Bad Day
1

Here is a base R solution using merge:

merge(df1, df2, by.x = "df1Date", by.y = "df2date");
#     df1Date   VarA VarB     VarC
#1 2010-11-01    Red    1 Good Day
#2 2010-11-01   Blue    2 Good Day
#3 2010-11-02  Green    3  Bad Day
#4 2010-11-02 Yellow    4  Bad Day
#5 2010-11-02 Orange    5  Bad Day
#6 2010-11-02  Black    6  Bad Day
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
0

Since OP is looking for an efficient and fast way to get result hence my suggestion would be to use left_join approach using data.table.

library(data.table)
setDT(df1)
setDT(df2)

#left_join in data.table way
df1[df2, on=.(df1Date = df2date)]

# df1Date   VarA VarB     VarC
# 1: 2010-11-01    Red    1 Good Day
# 2: 2010-11-01   Blue    2 Good Day
# 3: 2010-11-02  Green    3  Bad Day
# 4: 2010-11-02 Yellow    4  Bad Day
# 5: 2010-11-02 Orange    5  Bad Day
# 6: 2010-11-02  Black    6  Bad Day
MKR
  • 19,739
  • 4
  • 23
  • 33