Suppose I have an ordered data frame that looks like this:
df <- data.frame(customer = c('cust1','cust1','cust2','cust3','cust3'),
start_month = as.Date(c('2016-03-01','2017-08-01','2016-03-01','2017-07-01','2017-10-01')),
price = c(29,29,59,99,59),
end_month = as.Date(c('2017-08-01',NA,'2017-09-01','2017-09-01',NA)));
How can I write a script in R with the following business rule: if a customer ended and started in the same month, and the price did not change, remove the latest transaction. Else, keep the transaction. The resulting data frame would look like this:
new_df <- data.frame( customer = c('cust1','cust2','cust3','cust3'),
start_date = as.Date(c('2016-03-01','2016-03-01','2017-07-01','2017-10-01')),
price = c(29,59,99,59),
end_date = as.Date(c(NA,'2017-09-01','2017-09-01',NA)));
In this example, cust1's 2017-08-01 is ignored and filtered out because the price is the same as their previous transaction. However, cust3's transaction is kept because the price is different.
How can I do this in R?