here is the use case I am trying to solve for:
I have a DataFrame which has following columns:
- Name
- Date
- SubscriptionID
- Sku
- Type(sale or refund)
What I am trying to do is to loop through the entire dataset sorted by Date ascending.
Once done the first instance of (subscriptionid and sku) should get a new value, say interval_value, of 1. While looping if the record comes again increment it if it is sale or do -1 if there is a refund.
essentially I am trying to figure out how many times has each subscription purchased. A subcan have potentially 2 sku,s hence I would like to do this using the subId and Sku.
In theory I can loop through the whole data frame and process line by line. HOwever I am looking for how would this be accomplished using Pandas, either using the Apply method or some other fashion that is more efficient.
EDITED:
This is the logic I would like to implement. (how to calculate the Interval Column).
Looking for your response. Thanks