I have a dataframe (call it txn_df
) that contains monetary transaction records, here are the significant columns in this problem:
txn_year txn_month custid withdraw deposit
2011 4 123 0.0 100.0
2011 5 123 0.0 0.0
2011 6 123 0.0 0.0
2011 7 123 50.1 0.0
2011 8 123 0.0 0.0
Assume also that we have multiple customers here. withdraw
and deposit
0.0 value for both means no transaction has taken place. What I want to do is to produce a new column that indicates how many months has occurred since there was a transaction. Something similar to this:
txn_year txn_month custid withdraw deposit num_months_since_last_txn
2011 4 123 0.0 100.0 0
2011 5 123 0.0 0.0 1
2011 6 123 0.0 0.0 2
2011 7 123 50.1 0.0 3
2011 8 123 0.0 0.0 1
The only solution so far that I can think of is to produce a new column has_txn
(which is either 1/0 or True/False) when either one of withdraw
and deposit
has value > 0.0 but I can't continue from there.