This is my transaction data. It shows the transactions made from the accounts in from
column to the accounts in to
column with the date and the amount information
data
id from to date amount
<int> <fctr> <fctr> <date> <dbl>
19521 6644 6934 2005-01-01 700.0
19524 6753 8456 2005-01-01 600.0
19523 9242 9333 2005-01-01 1000.0
… … … … …
1056317 7819 7454 2010-12-31 60.2
1056318 6164 7497 2010-12-31 107.5
1056319 7533 7492 2010-12-31 164.1
Now what I want to do is as follows: I want to go over every transaction row by row and for each account in from
column, I want to keep track of how much transaction amount they received over last 6 month at the time that particular transaction was made and want to save this information as a new column.(so this new column will describe total transaction amount received by the accounts in from
column in the last six months prior to the transaction date.)
For example:
In the very first line of data, for account 6644
, I should look into the to
column if 6644
got a transaction between the dates "2004-07-05"-"2005-01-01"
which is a 6-month period up until the date 2005-01-01
which is the date when the transaction -we are at- is made by 6644
. If there are such transactions that 6644
received, I should sum up them and add this sum information to a new column total_trx_amount_received_in_last_6month
as a value. Similarly, I should do the same for account 6753
and look for the transactions it got between the dates "2004-07-05"-"2005-01-01"
and sum them up and add this sum to the column total_trx_amount_received_in_last_6month
.
And, I should continue in this manner row by row in data.
So, how can I achieve this for the whole data?
PS: In the date interval "2004-07-05"-"2005-01-01"
, "2005-01-01"
is the transaction date, to get the second date "2004-07-05"
I subtract 180 days(approx. 6 months) from the transaction date "2005-01-01"
.
To see it better, I provide the following data:
I will also show how the output will be. Let's say we only have this many of transactions. Just consider 5370
account here, since other accounts 8605,6390,8934
don't receive any transactions here.
id from to date amount total_trx_amount_received_in_last_6month
<int> <fctr> <fctr> <date> <dbl> <dbl>
18529 5370 9356 2005-05-31 24.4 0.0
13742 5370 5605 2005-08-05 7618.0 0.0
9913 5370 8567 2005-09-12 21971.0 0.0
956 8605 5370 2005-10-05 5245.0 0.0
2557 5370 5636 2005-11-12 2921.0 5245.0
1602 6390 5370 2005-11-26 8000.0 0.0
18669 5370 8933 2005-11-30 169.2 (5245.0+8000.0)=13245
35900 5370 8483 2006-01-31 71.5 (5245.0+8000.0)=13245
48667 8934 5370 2006-03-31 14.6 0.0
51341 5370 7626 2006-04-11 4214.0 (8000.0+14.6)=8014.6
Here is what I did:
First note that this small data above is sorted by date
in ascending order.
In the first line, for account 5370
in from column
I look at the past data to see if 5370
received any transaction between the dates "2004-12-02"-"2005-05-31"
. Since the first line is the very first transaction, obviously there's no transaction received by 5370
before the date "2005-05-31"
, so I log 0.0
into the total_trx_amount_received_in_last_6month
column respectively. In the second line, for account 5370
in from column
, again there's no transaction received by 5370
between the dates "2005-02-06"-"2005-08-05"
, so I log 0.0
into the total_trx_amount_received_in_last_6month
column. Similarly, I log 0.0
in the 3rd and 4th lines for accounts 5370
and 8605
respectively. In the fifth line, for account 5370
in from column
, there's a transaction received by 5370
between the dates "2005-05-16"-"2005-11-12"
, which is received in "2005-10-05"
(in the 4th line of the data) with the amount 5245.0
so I log 5245.0
into the total_trx_amount_received_in_last_6month
column. In the sixth line, for account 6390
in from column
, there's no transaction received by 6390
between the dates "2005-05-30"-"2005-11-26"
, so I log 0.0
into the total_trx_amount_received_in_last_6month
column. And it goes like this for all the rows of the data.
dput() output of the data:
structure(list(id = c(18529L, 13742L, 9913L, 956L, 2557L, 1602L,
18669L, 35900L, 48667L, 51341L, 53713L, 60126L, 60545L, 65113L,
66783L, 83324L, 87614L, 88898L, 89874L, 94765L, 100277L, 101587L,
103444L, 108414L, 113319L, 121516L, 126607L, 130170L, 131771L,
135002L, 149431L, 157403L, 157645L, 158831L, 162597L, 162680L,
163901L, 165044L, 167082L, 168562L, 168940L, 172578L, 173031L,
173267L, 177507L, 179167L, 182612L, 183499L, 188171L, 189625L,
193940L, 198764L, 199342L, 200134L, 203328L, 203763L, 204733L,
205651L, 209672L, 210242L, 210979L, 214532L, 214741L, 215738L,
216709L, 220828L, 222140L, 222905L, 226133L, 226527L, 227160L,
228193L, 231782L, 232454L, 233774L, 237836L, 237837L, 238860L,
240223L, 245032L, 246673L, 247561L, 251611L, 251696L, 252663L,
254410L, 255126L, 255230L, 258484L, 258485L, 259309L, 259910L,
260542L, 262091L, 264462L, 264887L, 264888L, 266125L, 268574L,
272959L), from = c("5370", "5370", "5370", "8605", "5370", "6390",
"5370", "5370", "8934", "5370", "5635", "6046", "5680", "8026",
"9037", "5370", "7816", "8046", "5492", "8756", "5370", "9254",
"5370", "5370", "7078", "6615", "5370", "9817", "8228", "8822",
"5735", "7058", "5370", "8667", "9315", "6053", "7990", "8247",
"8165", "5656", "9261", "5929", "8251", "5370", "6725", "5370",
"6004", "7022", "7442", "5370", "8679", "6491", "7078", "5370",
"5370", "5370", "5658", "5370", "9296", "8386", "5370", "5370",
"5370", "9535", "5370", "7541", "5370", "9621", "5370", "7158",
"8240", "5370", "5370", "8025", "5370", "5370", "5370", "6989",
"5370", "7059", "5370", "5370", "5370", "9121", "5608", "5370",
"5370", "7551", "5370", "5370", "5370", "5370", "9163", "9362",
"6072", "5370", "5370", "5370", "5370", "5370"), to = c("9356",
"5605", "8567", "5370", "5636", "5370", "8933", "8483", "5370",
"7626", "5370", "5370", "5370", "5370", "5370", "9676", "5370",
"5370", "5370", "5370", "9105", "5370", "9772", "6979", "5370",
"5370", "7564", "5370", "5370", "5370", "5370", "5370", "8744",
"5370", "5370", "5370", "5370", "5370", "5370", "5370", "5370",
"5370", "5370", "7318", "5370", "8433", "5370", "5370", "5370",
"7122", "5370", "5370", "5370", "8566", "6728", "9689", "5370",
"8342", "5370", "5370", "5614", "5596", "5953", "5370", "7336",
"5370", "7247", "5370", "7291", "5370", "5370", "6282", "7236",
"5370", "8866", "8613", "9247", "5370", "6767", "5370", "9273",
"7320", "9533", "5370", "5370", "8930", "9343", "5370", "9499",
"7693", "7830", "5392", "5370", "5370", "5370", "7497", "8516",
"9023", "7310", "8939"), date = structure(c(12934, 13000, 13038,
13061, 13099, 13113, 13117, 13179, 13238, 13249, 13268, 13296,
13299, 13309, 13314, 13391, 13400, 13404, 13409, 13428, 13452,
13452, 13460, 13482, 13493, 13518, 13526, 13537, 13542, 13544,
13596, 13616, 13617, 13626, 13633, 13633, 13639, 13642, 13646,
13656, 13660, 13664, 13667, 13669, 13677, 13686, 13694, 13694,
13707, 13716, 13725, 13738, 13739, 13746, 13756, 13756, 13756,
13761, 13769, 13770, 13776, 13786, 13786, 13786, 13791, 13799,
13806, 13813, 13817, 13817, 13817, 13822, 13829, 13830, 13836,
13847, 13847, 13847, 13852, 13860, 13866, 13871, 13878, 13878,
13878, 13882, 13883, 13883, 13887, 13887, 13888, 13889, 13890,
13891, 13895, 13896, 13896, 13899, 13905, 13909), class = "Date"),
amount = c(24.4, 7618, 21971, 5245, 2921, 8000, 169.2, 71.5,
14.6, 4214, 14.6, 13920, 14.6, 24640, 1600, 261.1, 16400,
3500, 2700, 19882, 182, 14.6, 16927, 25653, 3059, 2880, 9658,
4500, 12480, 14.6, 1000, 3679, 34430, 12600, 14.6, 19.2,
4900, 826, 3679, 2100, 38000, 79, 11400, 21495, 3679, 200,
14.6, 100.6, 3679, 5300, 108.9, 3679, 2696, 7500, 171.6,
14.6, 99.2, 2452, 3679, 3218, 700, 69.7, 14.6, 91.5, 2452,
3679, 2900, 17572, 14.6, 14.6, 90.5, 2452, 49752, 3679, 1900,
14.6, 870, 85.2, 2452, 3679, 1600, 540, 14.6, 14.6, 79, 210,
2452, 28400, 720, 180, 420, 44289, 489, 3679, 840, 2900,
150, 870, 420, 14.6)), row.names = c(NA, -100L), class = "data.frame")
(I turned from
and to
columns into character, since they had a huge number of levels the output would take so much space otherwise)