Pandas - count things of one column by a condition of other

Question

I have a DataFrame d of games played of the game Go. The important columns for these are the player name, and if they won or not that particular game. How can I make a new DataFrame with a column that is Player name (without repeating names), the total games they played and the total number they won. Example:

import pandas as pd

d = {'Nombres': ['pepe','tito','pepe','pepe'], 'col2': ['Win','Lost','Win','Lost']}
df2 = pd.DataFrame(data=d)

This is what I currently do, counting the number of games played per user:

df3 = df2.groupby(['Jugador']).size().reset_index(name='count')

How can I add the column of how many games each player won?

This would be called *"aggregate Wins by Player"* or *"tabulate Wins and Matches by Player"*. Also, you could name 'col2' ('Win'/'Lost' column) 'Result'. — smci, Jan 24 '20 at 22:53

Umar.H · Answer 1 · 2020-01-24T18:48:17.927

3

Few ways (feel free to add more)

1

df2.groupby(['Nombres','col2'])['col2'].count().to_frame()
              col2
Nombres col2      
pepe    Lost     1
        Win      2
tito    Lost     1

2

pd.crosstab(df2.Nombres,df2.col2,df2.col2,aggfunc='count').fillna(0)
col2     Lost  Win
Nombres           
pepe      1.0  2.0
tito      1.0  0.0

3

df2.loc[df2['col2'] == 'Win']['Nombres'].value_counts()
pepe    2
Name: Nombres, dtype: int64

4

df2.loc[df2['col2'] == 'Win'].groupby('Nombres').size()
Nombres
pepe    2
dtype: int64

5

df2['win_counter'] = df2.groupby('Nombres')['col2'].apply(lambda x : x.eq('Win').cumsum())
print(df2)
     Nombres  col2    win_counter
0    pepe     Win              1
1    tito     Lost             0
2    pepe     Win              2
3    pepe     Lost             2

edited Jan 24 '20 at 18:48

answered Jan 24 '20 at 18:38

Umar.H

22,559
7
39
74

@ansev i hardly use pivot i most go for `pd.crosstab` but feel free to add :) – Umar.H Jan 24 '20 at 19:28
pandas crosstab uses pivot table internally, and if you check it sometimes it can be slow – ansev Jan 24 '20 at 19:43
@ansev that makes total sense, thanks for clarifying! – Umar.H Jan 24 '20 at 20:52

score 2 · Answer 2 · edited Jun 20 '20 at 09:12

IIUC, `DataFrame.pivot_table`

new_df = (df2.pivot_table(index = 'Nombres',columns = 'col2',
                          values = 'col2',
                          aggfunc = 'size',
                          fill_value = 0)
            #.loc[:,['Win']] #if you want drop the column of Lost
             .assign(Total_Match=lambda x: x.sum(axis=1))
             .rename_axis(columns = None)
             .reset_index())
print(new_df)

Output

  Nombres  Lost  Win  Total_Match
0    pepe     1    2            3
1    tito     1    0            1

Pandas - count things of one column by a condition of other

2 Answers2

1

2

3

4

5

IIUC, DataFrame.pivot_table

IIUC, `DataFrame.pivot_table`