I have a DataFrame for the boardgame Terraforming Mars on which I want to do several calculations in pandas. The df has columns for points accumulated through different methods, as well as the game_id
, player
, corporation
, etc. An example df is shown below:
>>> df.columns
Index(['Date', 'Game', 'Player', 'Coportation', 'TR', 'MILESTONES', 'AWARDS',
'CARDS', 'GREENERY', 'CITIES+G', 'Total'],
dtype='object')
>>> df.dtypes
Date object
Game int64
Player int64
Coportation object
TR int64
MILESTONES int64
AWARDS int64
CARDS int64
GREENERY int64
CITIES+G int64
Total int64
dtype: object
>>> df.head()
Date Game Player Coportation ... CARDS GREENERY CITIES+G Total
0 NaN 3 1 NaN ... 41 2 3 90
1 NaN 3 2 NaN ... 6 8 9 78
2 NaN 3 3 NaN ... 26 5 11 90
3 NaN 4 1 NaN ... 44 6 2 97
4 NaN 4 2 NaN ... 22 8 2 77
I want to answer several questions, most of which are pretty simple: highest scoring game, lowest scoring game, mean game score, highest score by a player, lowest score by a player, player who has won the most, play who has lost the most, best corporation, worst corporation, etc.
For player who has won the most
, I want to add a column winner
for each game
. In order to do this I need to check
for each game get highest total and corresponding Player
add corresponding player into column
How do I do that? I have checked pandas create new column based on values from other columns / apply a function of multiple columns, row-wise but it does not answer this question, as it's not just a few if
statements in a function. I know this is likely relatively simple, but it's not seeming to coming to me.