pandas dataframe aggregate calculation

Question

I have a pandas dataframe containing sports matches:

Winner      Loser          
A           B
B           A 
A           C

i want to have win-loss statistics for each player (i.e. A, B, and C). so for A the result should be 2-1. for B it should be 1-1 and for C it should be 0-1.

i know how to calculate this via iterating line-by-line over the dataframe with:

for index, match in df.iterrows():
    //code for calculating win-loss here

but I am sure that there is a more pythonic/pandas-ish way to do this? any hints on this are appreciated.

Simon Gibbons · Accepted Answer · 2015-05-14T11:28:41.193

2

You can use the groupby method with the size aggregate to do this

for example

print df.groupby('Loser').size()

Would yield a dataframe with the counts of number of losses.

Loser
A         1
B         1
C         1
dtype: int64

You can then combine these into the score counts as follows (using the fillna method to set a default value if a team has no wins or losses)

wins = df.groupby('Winner').size()
losses = df.groupby('Loser').size()

scores = pd.DataFrame({'Wins' : wins, 'Losses' : losses}).fillna(0)

Yielding the final score counts as

   Losses  Wins
A       1     2
B       1     1
C       1     0

edited May 14 '15 at 11:28

answered May 14 '15 at 11:18

Simon Gibbons

6,969
1
21
34

thanks. perfect solution. now i just want to beautify it by changing wins-losses columns. – beta May 14 '15 at 11:39
1

to answer my question in above comment. this worked: `scores = scores.reindex_axis(['wins','losses'], axis=1)` – beta May 14 '15 at 11:41

Mike Müller · Answer 2 · 2015-05-14T11:46:08.913

On way of doing it:

win = df.groupby('Winner').count()
los = df.groupby('Loser').count()
score = pd.DataFrame([win.Loser, los.Winner])
score

gives:

        A   B   C
Loser   2   1   0
Winner  1   1   1

and:

score.T

shows it transposed:

    Loser   Winner
A   2       1
B   1       1
C   0       1

This is the dataframe used above:

df = pd.DataFrame({'Winner': list('ABA'), 
                   'Loser': list('BAC')})  

df

  Loser  Winner
0 B      A
1 A      B
2 C      A

All in one line:

pd.DataFrame([df.groupby('Winner').count().Loser, 
              df.groupby('Loser').count().Winner]).fillna(0).T

results in:

    Loser   Winner
A   2       1
B   1       1
C   0       1

hm. the code seems to get stuck at "los = df.groupby('Loser').count()" ? — beta, May 14 '15 at 11:42

score 0 · Answer 3 · answered May 14 '15 at 11:21

What format do you want your results in?

A simple manner to count wins and losses would be to use collections.Counter:

import pandas as pd
from collections import Counter

df=pd.DataFrame([['A','B'],['B','C'],['A','C']], columns=['winner','loser'])

win_counts = Counter(df['winner'])

win_counts is a dictionary like the one below:

Counter({'A': 2, 'B': 1})

Still, I prefer Simon Gibbons answer above as it does not require additional modules.

pandas dataframe aggregate calculation

3 Answers3

Linked