0

I have a dataset of hockey statistics and I want to apply a weight multiplier to certain statistics based on certain conditions.

A snippet of my dataset:

    Player            Pos    GP      G     GF/GP    S     Shots/GP    S%    TOI/GP
2   Andrew Cogliano   1.0   79.2    11.0    0.1   126.8      1.6     8.3    14.44
12  Artturi Lehkonen  2.0   73.0    14.6    0.2   158.6      2.2     9.3    15.29
28  Cale Makar        4.0   59.3    16.0    0.3   155.0      2.6     9.8    23.67
31  Darren Helm       1.0   66.6    10.5    0.2   125.0      1.9     8.6    14.37
61  Gabriel Landeskog 2.0   72.0    24.3    0.3   196.1      2.7     12.8   19.46
103 Nathan MacKinnon  1.0   73.8    27.8    0.4   274.4      3.7     9.9    19.69

What I am trying to do is create a function that multiplies 'G', 'GF/GP', 'S', and 'Shots/GP' by a specific weight - 1.1 for example. But I want to only do that for players based on two categories:

  • Defence ('Pos' = 4.0) with 50 or more games ('GP') and 20 min or more time on ice per game ('TOI/GP')
  • Offense ('Pos' != 4.0) with 50 or more games ('GP') and 14 min or more time on ice per game ('TOI/GP')

I can identify these groups by:

def_cond = df.loc[(df["Pos"]==4.0) & (df["GP"]>=50) & (df["TOI/GP"] >=20.00)]
off_cond = df.loc[(df["Pos"]!=4.0) & (df["GP"]>=50) & (df["TOI/GP"] >=14.00)]

Output for def_cond:

    Player        Pos   GP     G    GF/GP   S     Shots/GP   S%   TOI/GP
28  Cale Makar    4.0   59.3  16.0   0.3   155.0    2.6     9.8   23.67
41  Devon Toews   4.0   58.8  8.2    0.1   120.5    2.1     6.7   22.14
45  Erik Johnson  4.0   67.4  7.3    0.1   140.9    2.1     5.1   22.22
112 Samuel Girard 4.0   68.0  4.4    0.1   90.8     1.3     5.0   20.75

Issue:

What I want to do is take this output and multiply 'G', 'GF/GP', 'S', and 'Shots/GP' by a weight value - again 1.1 for example.

I have tried various things such as:

if def_cond == True:
        df[["G", "GF/GP", S", "Shots/GP"]].multiply(1.1, axis="index")

Or simply

if def_cond == True:
        df["G"] = (df["G"]*1.1)

Pretty much everything I try results in the following error:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I am new to this so any and all advice is welcome!

Josh
  • 7
  • 2
  • You can try to use `apply()` method of DataFrame, [here](https://stackoverflow.com/questions/13331698/how-to-apply-a-function-to-two-columns-of-pandas-dataframe) is a simple example how it can look like – Masha Sep 25 '22 at 21:27
  • I have tried that a couple of times as well, but am not figuring out a way to use that within my specific conditions and not applying to every row in the df. Whenever I try to use in conjunction with an if/then statement it doesn't work. – Josh Sep 25 '22 at 22:26

1 Answers1

0

I would try this:

def f(df,weight):
    for i in df.index:
        if (
            (df.loc[i,'Pos']==4.0 and df.loc[i,'GP']>=50 
                                  and df.loc[i,'TOI/GP']>=20)
            or
            (df.loc[i,'Pos']!=4.0 and df.loc[i,'GP']>=50 
                                  and df.loc[i,'TOI/GP']>=14)
            ):
            df.loc[i,['G','GF/GP','S','Shots/GP']]*=weight

Though, i'm pretty sure, it is not the best solution...