This is the followup question of my question here:
This is my transaction data
data
id from to date amount
<int> <fctr> <fctr> <date> <dbl>
19521 6644 6934 2005-01-01 700.0
19524 6753 8456 2005-01-01 600.0
19523 9242 9333 2005-01-01 1000.0
… … … … …
1055597 9866 9736 2010-12-31 278.9
1053519 9868 8644 2010-12-31 242.8
1052790 9869 8399 2010-12-31 372.2
Now, for each account in the from
column, I want to compute how much transaction amount they received over last 6 month at the time the transaction was made. To do this:
df <- data # df is just a copy of "data"
setDT(df)[, total_trx_amount_received_in_last_6month := sapply(date, function(x)
sum(amount[between(date, x-180, x)])), to]
# since I want to merge "df" and "data" based on the columns "from" and "date", I change the name of the column "to" and make it "from"
df <- select(df, to,date,total_trx_amount_received_in_last_6month) %>% rename(from=to)
df
from date total_trx_amount_received_in_last_6month
<fctr> <date> <dbl>
7468 2005-01-04 700.0
6213 2005-01-08 12032.0
7517 2005-01-10 1000.0
6143 2005-01-12 4976.0
6254 2005-01-14 200.0
6669 2005-01-20 200.0
6934 2005-01-24 72160.0
9240 2005-01-26 21061.0
6374 2005-01-30 1000.0
6143 2005-01-31 4989.4
Now I want to add this new column total_trx_amount_received_in_last_6month
into the original data
. So, I should merge these two dataframes data
and df
by the columns from
and date
but the matching criteria for date is a range of values, not a single value. For example for account 7468
if the original data
contains a transaction 7468
made and the transaction date falls into interval of "2004-07-08"-"2005-01-04"
(which is the period of last 6 months, starting from "2005-01-04"
), then corresponding value 700.0
in df$total_trx_amount_received_in_last_6month
should be added to the data$total_trx_amount_received_in_last_6month
How can I do that?