2

I have 2 data frames in R.Data1 has 2 columns id, date and Data2 has 3 columns id, date, level.I want to set level column in Data1 based on level and date columns in Data2.

Data1 = data.frame(id = c(1,1,1), dates = c("2014-06","2016-02","2016-05"))

id  date
1  2014-06 
1  2016-02 
1  2016-05 

Data2 = data.frame(id = c(1,1,1), dates = c("2015-07","2016-04","2016-07"), level=c(3,4,5))

id    date     level
1     2015-07   3
1     2016-04   4
1     2016-07   5

So resulting data frame should be:

id  date    level
1  2014-06   NULL
1  2016-02    3
1  2016-05    4
Jaap
  • 81,064
  • 34
  • 182
  • 193
Arun
  • 33
  • 5

1 Answers1

6

You can accomplish this with the rolling joins from the data.table-package and converting the dates-columns to a date-class (see the note at the end of this post):

library(data.table)
setDT(Data1, key = c('id','dates'))
setDT(Data2, key = c('id','dates'))

Data1[Data2, lev := level, roll = -Inf, rollends = c(TRUE,FALSE)][]

which gives:

> Data1
   id      dates lev
1:  1 2014-06-01  NA
2:  1 2016-02-01   3
3:  1 2016-05-01   4

Explanation:

  • Convert the dataframes to datatables with setDT and set the key to the columns which are needed for the join
  • Join and create a new variable in Data1 with lev := level. With roll = -Inf you roll backwards and with rollends = c(TRUE,FALSE) you only roll the first value backwards.

Setting the keys beforehand isn't necessary. You could also do:

setDT(Data1)
setDT(Data2)

Data1[Data2, on = c('id','dates'), lev := level, roll = -Inf, rollends = c(TRUE,FALSE)][]

Used data:

Data1 = data.frame(id = c(1,1,1), dates = c("2014-06","2016-02","2016-05"))
Data2 = data.frame(id = c(1,1,1), dates = c("2015-07","2016-04","2016-07"), level=c(3,4,5))
Data1$dates <- as.Date(paste0(Data1$dates,'-01'))
Data2$dates <- as.Date(paste0(Data2$dates,'-01'))

NOTE: I converted the dates-columns to a date-format by adding the first day to each month. This necessary in order to properly do a rolling join as specified.

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • Thanks @Procrastinatus I am getting little error for the following case...
    Data1 ID date 1 2013-12 1 2014-06 1 2015-09 1 2015-10
    Data2 id date level 1 2014-02 3 1 2015-10 6 Resulting dataset I am getting is : ID date level 1 2013-12 NA 1 2014-06 3 1 2015-09 NA 1 2015-10 6 in final dataset, on 2015-09 level should be 3 not NA..
    – Arun Dec 21 '16 at 18:18
  • @Arun according to your response to [LeoP's last comment](http://stackoverflow.com/questions/41259355/setting-a-value-in-one-dataframe-by-looking-its-value-in-another-dataframe-based#comment69720593_41259355), the `NA`-value is the correct one imo – Jaap Dec 21 '16 at 20:12
  • @LeoP. It is certainly worth the somewhat steep learning curve. Once you know the basics, you will start rewarding the efficiency (also in typing) and the flexibility. – Jaap Dec 21 '16 at 20:17
  • @ProcrastinatusMaximus data1 = data.frame(id=c(1,1,1,1),date=c("2013-12-01","2014-06-01","2015-09-01","2015-10-01")) data2 = data.frame(id=c(1,1,1,1),date=c("2014-02-01","2015-10-01"),level=c(3,6)) data1$date = as.Date(data1$date,format="%Y-%m-%d") data2$date = as.Date(data2$date,format="%Y-%m-%d") setDT(data1) setDT(data2) data1[data2, on = c('id','date'), lev:=level, roll=-Inf, rollends=c(TRUE,FALSE)][] in this example please see the output... 3rd row has lev column as NULL and not 3 – Arun Dec 21 '16 at 21:35