Create new column in dataframe using multiple columns of different types

Question

I have a DataFrame for the boardgame Terraforming Mars on which I want to do several calculations in pandas. The df has columns for points accumulated through different methods, as well as the game_id, player, corporation, etc. An example df is shown below:

>>> df.columns

Index(['Date', 'Game', 'Player', 'Coportation', 'TR', 'MILESTONES', 'AWARDS',
       'CARDS', 'GREENERY', 'CITIES+G', 'Total'],
      dtype='object')

>>> df.dtypes

Date           object
Game            int64
Player          int64
Coportation    object
TR              int64
MILESTONES      int64
AWARDS          int64
CARDS           int64
GREENERY        int64
CITIES+G        int64
Total           int64
dtype: object

>>> df.head()

          Date  Game   Player     Coportation  ...  CARDS  GREENERY  CITIES+G  Total
0          NaN     3   1             NaN       ...     41         2         3     90
1          NaN     3   2             NaN       ...      6         8         9     78
2          NaN     3   3             NaN       ...     26         5        11     90
3          NaN     4   1             NaN       ...     44         6         2     97
4          NaN     4   2             NaN       ...     22         8         2     77

I want to answer several questions, most of which are pretty simple: highest scoring game, lowest scoring game, mean game score, highest score by a player, lowest score by a player, player who has won the most, play who has lost the most, best corporation, worst corporation, etc.

For player who has won the most, I want to add a column winner for each game. In order to do this I need to check

for each game get highest total and corresponding Player
add corresponding player into column

How do I do that? I have checked pandas create new column based on values from other columns / apply a function of multiple columns, row-wise but it does not answer this question, as it's not just a few if statements in a function. I know this is likely relatively simple, but it's not seeming to coming to me.

Are there gonna be tied-winners for some games? Or do the games have exactly one winner each? — Quang Hoang, Jun 13 '22 at 21:52
It's possible there can be a tie, at least in point value. So in that case, yes, there can be. — DrakeMurdoch, Jun 13 '22 at 23:25
I think you are searching for a `groupby` followed by an `agg`. Did you already find that while searching for a solution? — Rabinzel, Jun 14 '22 at 05:48

score 0 · Answer 1 · answered Jun 14 '22 at 02:06

So I ended up using this function, which I wouldn't consider overly pythonic. It, did, however, do the trick:

def winners(df):
    
    '''
    Takes in a DataFrame and outputs a winners column.
    If there is a tie, it adds both winners to the column
    '''

    lst = []
    for x in range(df['Game'].max()+1):
        if x in df['Game']:
            lst_1 = []
            df_1 = df[df['Game'] == x].reset_index(drop=True)
            mx = df_1['Total'].max()
            win = ', '.join(df_1['Player'][df_1['Total'] == mx].tolist())
            lst_1.append(win)
            lst_2 = lst_1 * len(df_1.index)
            lst.extend(lst_2)

    df['Winner'] = lst

Using this, just run winners(df) and it will pop out a Winner column.

Create new column in dataframe using multiple columns of different types

1 Answers1