2

I have two data frames mondays & tdates as follows :

T Dates 
User.ID   tdate
1       11-02-2013
1       04-03-2013
1       16-04-2015
1       03-05-2015
1       05-05-2015
1       11-05-2015
1       29-09-2015
1       26-11-2013
1       28-11-2013
3       01-02-2016
4       22-11-2012
4       25-04-2013
4       29-05-2013



Mondays     
ID  Monday      Closest Date
1   05-09-2016  
1   20-04-2015  
1   27-07-2015  
1   08-06-2015  
1   13-10-2014  
3   16-09-2013  
3   16-02-2015  
3   29-08-2016  
3   26-05-2014  
3   29-02-2016  
3   18-07-2016  
3   22-02-2016  
4   16-11-2015  

Now i want to return the past closest or equivalent date in 3rd column from tdates for each of the User.ID in mondays. For e.g the expected output is

Mondays       
ID  Monday      Closest Date
1   05-09-2016  29-09-2015
1   20-04-2015  16-04-2015
1   27-07-2015  11-05-2015
1   08-06-2015  11-05-2015
1   13-10-2014  28-11-2013
3   16-09-2013  NA
3   16-02-2015  NA
3   29-08-2016  01-02-2016
3   26-05-2014  NA
3   29-02-2016  01-02-2016
3   18-07-2016  01-02-2016
3   22-02-2016  01-02-2016
4   16-11-2015  29-05-2013

For ID = 1 & Monday = 05-09-2016

the past closest tdate is 29-09-2015 thus it'll get this date in Closest Date column

Note : If no transaction date is found to past or equivalent to monday's date fill NAs

This has to be done for a very large data set , any ideas how this can be done . I have tried this using a customized function as follows :

lasttxndate <- function(userid, mydate){
+     return(max(subset(tdates$Date.Asked, tdates$User.ID == userid & tdates$Date.Asked <= as.Date(mydate))))
+ }

But this isn't working out when using this with lapply' orsapply`.

2 Answers2

2
# date conversion
mondays$Monday <- as.Date(mondays$Monday, "%d-%m-%Y")
tdates$tdate <- as.Date(tdates$tdate, "%d-%m-%Y")

# convert to data.table
library(data.table) 
setDT(mondays) 
setDT(tdates)

# you need identical column names for join
tdates[, ID := User.ID, ]
tdates[, Monday := tdate, ]

tdates[mondays, on = c("ID", "Monday"), roll = Inf]

    User.ID      tdate ID     Monday
 1:       1 2015-09-29  1 2016-09-05
 2:       1 2015-04-16  1 2015-04-20
 3:       1 2015-05-11  1 2015-07-27
 4:       1 2015-05-11  1 2015-06-08
 5:       1 2013-11-28  1 2014-10-13
 6:      NA       <NA>  3 2013-09-16
 7:      NA       <NA>  3 2015-02-16
 8:       3 2016-02-01  3 2016-08-29
 9:      NA       <NA>  3 2014-05-26
10:       3 2016-02-01  3 2016-02-29
11:       3 2016-02-01  3 2016-07-18
12:       3 2016-02-01  3 2016-02-22
13:       4 2013-05-29  4 2015-11-16

tdate column gives you the desired dates

ExperimenteR
  • 4,453
  • 1
  • 15
  • 19
  • This works fine, could you please explain `tdates[mondays, on = c("ID", "Monday"), roll = Inf]` –  Dec 12 '16 at 10:43
  • @pankaj There are myriad of great answers on SO about data.table join. For rolling join read http://stackoverflow.com/a/27763960/3573401 . – ExperimenteR Dec 12 '16 at 13:00
  • also here http://stackoverflow.com/questions/12030932/rolling-joins-data-table-in-r – ExperimenteR Dec 12 '16 at 13:15
1

This code works well:

T.Dates <- data.frame( 
User.ID=c("1","1","1","1","1","1","1","1","1","3","4","4","4"),
tdate=as.Date(c("11-02-2013","04-03-2013","16-04-2015","03-05-2015","05-05-2015","11-05-2015","29-09-2015","26-11-2013","28-11-2013","01-02-2016","22-11-2012","25-04-2013","29-05-2013"),format="%d-%m-%Y"))


Mondays <- data.frame( 
  ID=c("1","1","1","1","1","3","3","3","3","3","3","3","4"),
  Monday=as.Date(c("05-09-2016","20-04-2015","27-07-2015","08-06-2015","13-10-2014","16-09-2013","16-02-2015","29-08-2016","26-05-2014","29-02-2016","18-07-2016","22-02-2016","16-11-2015"),format="%d-%m-%Y"))

Mondays$Closest.Date <- NA
Mondays$Closest.Date <- as.Date(Mondays$Closest.Date, format="%d-%m-%Y")

for(i in 1:nrow(Mondays)){
Mondays[i,"Closest.Date"] <- max(T.Dates$tdate[T.Dates$User.ID==Mondays$ID[i] & T.Dates$tdate <= Mondays[i,"Monday"]])  
}

The output:

> Mondays
   ID     Monday Closest.Date
1   1 2016-09-05   2015-09-29
2   1 2015-04-20   2015-04-16
3   1 2015-07-27   2015-05-11
4   1 2015-06-08   2015-05-11
5   1 2014-10-13   2013-11-28
6   3 2013-09-16         <NA>
7   3 2015-02-16         <NA>
8   3 2016-08-29   2016-02-01
9   3 2014-05-26         <NA>
10  3 2016-02-29   2016-02-01
11  3 2016-07-18   2016-02-01
12  3 2016-02-22   2016-02-01
13  4 2015-11-16   2013-05-29
Mario M.
  • 802
  • 11
  • 26