I have already removed rows where a company was not charged anything for a given period
(e.g., a row where revenue
== 0).
Here is an example of the billing data:
import numpy as np
import pandas as pd
data = {
'account_id': ['111','111','222','333','666','666','111','222','333','666','666'],
'company': ['initech','initech','jackson steinem & co','ingen','enron','enron','initech','jackson steinem & co','ingen','enron','enron'],
'billing_type': ['subscription','discount','subscription','subscription','subscription','discount','subscription','subscription','subscription','subscription','discount'],
'period': ['2012-10-31','2012-10-31','2012-10-31','2012-10-31','2012-10-31','2012-10-31','2012-11-30','2012-11-30','2012-11-30','2012-11-30','2012-11-30'],
'revenue':[39.95,-39.95,199.95,299.95,499.95,-499.95,39.95,199.95,299.95,499.95,-499.95]
}
df = pd.DataFrame(data)
df['period'] = pd.to_datetime(df['period'],format='%Y-%m-%d')
This yields a dataframe like so:
In [16]: df
Out[16]:
account_id billing_type company period revenue
0 111 subscription initech 2012-10-31 39.95
1 111 discount initech 2012-10-31 -39.95
2 222 subscription jackson steinem & co 2012-10-31 199.95
3 333 subscription ingen 2012-10-31 299.95
4 666 subscription enron 2012-10-31 499.95
5 666 discount enron 2012-10-31 -499.95
6 111 subscription initech 2012-11-30 39.95
7 222 subscription jackson steinem & co 2012-11-30 199.95
8 333 subscription ingen 2012-11-30 299.95
9 666 subscription enron 2012-11-30 499.95
10 666 discount enron 2012-11-30 -499.95
What I need to do is remove the rows where revenue
adds up to zero for a given company
/period
. So, for instance, I need to remove all of Enron's rows but only the October 2012 period for Initech:
In [17]: df.groupby(['company','period'])['revenue'].sum()
Out[17]:
company period
enron 2012-10-31 0.00
2012-11-30 0.00
ingen 2012-10-31 299.95
2012-11-30 299.95
initech 2012-10-31 0.00
2012-11-30 39.95
jackson steinem & co 2012-10-31 199.95
2012-11-30 199.95
A number of other posts address similar cases, and I have not been able to find anything that exactly helps/explains how to accomplish this request.