0

I have the following Python function:

def compute_average_fg_rating(df, mask=''):
    df = df[['HorseId', 'FGrating']]
    if len(mask) == 0:
        df.loc['cumsum'] = df.groupby('HorseId', group_keys=False)['FGrating'].apply(
            lambda x: x.shift(fill_value=0).cumsum())
        return df.loc['cumsum'] / df.groupby('HorseId')['FGrating'].cumcount()
    else:
        return df.loc[mask].groupby('HorseId', group_keys=False)['FGrating'].apply(
            lambda x: x.shift().expanding().mean())

When I try to run the code, I get the "A value is trying to be set on a copy of a slice from a DataFrame" warning at the line:

        df.loc['cumsum'] = df.groupby('HorseId', group_keys=False)['FGrating'].apply(
            lambda x: x.shift(fill_value=0).cumsum())

I cannot see where the problematic code is. Can you help me?

Bogdan Doicin
  • 2,342
  • 5
  • 25
  • 34

2 Answers2

1

This is because df.loc['cumsum'] is not a reference to a specific row in the dataframe. In your if statement change it like this:

cumsum_df = df.groupby('HorseId', group_keys=False)['FGrating'].apply(
            lambda x: x.shift(fill_value=0).cumsum())
df.loc[cumsum_df.index, 'cumsum'] = cumsum_df
        return df['cumsum'] / df.groupby('HorseId')['FGrating'].cumcount()

This should solve your problem

Phoenix
  • 1,343
  • 8
  • 10
0

The warning message is indicating that the operation df.loc['cumsum'] = ... is trying to set values on a copy of a slice from the original DataFrame df, instead of the original DataFrame itself.

This can happen when you select a subset of the original DataFrame using indexing or slicing, and then modify that subset in place. In some cases, pandas returns a copy of the subset instead of a view of the original DataFrame, and trying to modify this copy can result in unexpected behavior.

In this case, the issue is with the line df = df[['HorseId', 'FGrating']], which creates a new DataFrame that is a subset of the original df by selecting only the columns 'HorseId' and 'FGrating'. This creates a copy of the subset, not a view of the original DataFrame.

To fix the warning message, you can modify the code to avoid creating a copy of the DataFrame. One way to do this is to use the loc accessor to select both rows and columns in the same operation:

df = df.loc[:, ['HorseId', 'FGrating']]

This selects all rows (:) and the columns 'HorseId' and 'FGrating'. By using the loc accessor, you ensure that the selection returns a view of the original DataFrame, not a copy.

With this change, the modified function would be:

def compute_average_fg_rating(df, mask=''):
    df = df.loc[:, ['HorseId', 'FGrating']]
    if len(mask) == 0:
        df.loc['cumsum'] = df.groupby('HorseId', group_keys=False)['FGrating'].apply(
            lambda x: x.shift(fill_value=0).cumsum())
        return df.loc['cumsum'] / df.groupby('HorseId')['FGrating'].cumcount()
    else:
        return df.loc[mask].groupby('HorseId', group_keys=False)['FGrating'].apply(
            lambda x: x.shift().expanding().mean())

This should resolve the warning message.

Bogdan Doicin
  • 2,342
  • 5
  • 25
  • 34