I have a DataFrame like below, it's a transaction log:
c_id t_type unit effective_date
1 enter 50 2/5/2020
2 exit 25 2/5/2020
1 exit 50 2/5/2020
2 enter 100 2/8/2020
3 enter 10 2/8/2020
1 enter 17 2/8/2020
3 exit 25 2/8/2020
3 exit 25 2/12/2020
I have a definition of active customer: a c_id who is entered units > exited_units.
For each day , I want to know how many active customers exists from beginning of the log.
Simplest way occur to mind is use for loop to consider each date as most recent day and get the count of active customers from beginning to that day, but I wanted more optimized scalable Pythonic/pandas way to solve this problem.
edit: My desired out put will be something like this:
effective_date active_count
2/5/2020 ?
2/8/2020 ?
2/12/2020 ?
2/13/2020 ?
2/14/2020 ?
2/19/2020 ?
2/20/2020 ?
replace question marks with active customer count of that day from beginning of the log. real issue here is For each day, we need to calculate the count from beginning of the log.