import pandas as pd
df = pd.DataFrame({'zip,company': ["46062|A","11236|B","11236|C","11236|C","11236|C","11236|A","11236|A","11236|A","11236|B","11236|B","11236|A","11236|A","11236|B","11236|A","11236|A","11236|B","11236|A","11236|A"],
'goodbadscore': ["good","bad","bad","good","good","bad","bad","good","good","good","bad","good","good","good","good","bad","bad","good"],
'postlcode' : ["46062","11236","11236","11236","11236","46062","11236","46062","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236"],
'companyname': ["A","B","C","C","C","A","A","A","B","B","A","A","B","A","A","B","A","A"]}
)
print(df)
-----updated a sample data frame above as suggestion-----
I tried to produce the result in Excel, but using countif and countifs break my desktop and even it's fine, it takes several minutes to complete the task. hope can get some help and directions.
here is what i try to achieve:
I want to score company's' reputation in several zip codes based on the collected data. columns needed to produce:
- countinzipcode
- countgoodscoreinzip
- dividegoodscore%(2/1)
- ranking
I was able to produce 1:
op = df.groupby(['zip+company'])['zip+company'].count()
have difficulty on 2: want to keep the output from 1, but it becomes 0 after apply. only want to show good for column 2
op = op.groupby(['zip+company'])[['zip+company','countgoodscoreunderzip']].apply(lambda x: x[x=='good'].count())
then 3, I guess it's a matter of selecting 2 and divided by 1
4 no idea yet how to rank in pandas, which could be a simple ranking
The pic of excel is the ideal output(updated with a sample data frame).
Thanks for the reading.