Using R Programming. I have dataset with Vendor_id, Bank_account_no and Date with over 3 million. I want to get the rows for each vendor_id where the Bank_account_no changes, for example from X to X to X (at least three time, could be more than three) to Y (just once) to X again within a three months period. The dataset is such that the changes are all random so the window is not fixed with number of rows for each vendor_id. I using the rle function to get length for different Bank_account_no. Not sure how can I create a logic in R for these many rows considering I want to run this logic for each vendor_id. May be data.table can help in this. The input is as follows:
Vendor_ID Bank_account_no Date
<!-- -->
dddd X 24-12-2018
dddd X 24-12-2018
dddd X 26-12-2018
dddd Y 27-12-2018
dddd X 28-12-2018
dddd X 29-12-2018
dddd X 29-12-2018
dddd X 31-12-2018
dddd X 24-01-2019
dddd Z 25-01-2019
dddd X 28-01-2019
dddd G 28-01-2019
dddd G 28-01-2019
eeee A 30-01-2019
eeee A 31-01-2019
eeee A 31-01-2019
eeee B 31-01-2019
eeee A 31-01-2019
The output should be:
Vendor_ID Bank_account_no Date Case
<!-- -->
dddd X 24-12-2018 Case1
dddd X 24-12-2018 Case1
dddd X 26-12-2018 Case1
dddd Y 27-12-2018 Case1
dddd X 28-12-2018 Case1
dddd X 29-12-2018 Case2
dddd X 29-12-2018 Case2
dddd X 31-12-2018 Case2
dddd X 24-01-2019 Case2
dddd Z 25-01-2019 Case2
dddd X 28-01-2019 Case2
eeee A 30-01-2019 Case3
eeee A 31-01-2019 Case3
eeee A 31-01-2019 Case3
eeee B 31-01-2019 Case3
eeee A 31-01-2019 Case3