Pandas - where have I 'Chained' assignment?

Question

I'm new to Python and have been displayed the 'SettingWithCopyWarning' I've tried to read and understand from other posts and websites, without any success. Would anybody be able to help me identify where I have chained assignment.

I think it is 'df_grouped = df1.groupby(["Menu Item"]).sum()'. If so how do I resolve the issue?

def get_max_item(startwk, endwk):
    df1 = pd.read_csv('data.csv')
    df_grouped = df1.groupby(['Item']).sum()
    df_date_extract = df_grouped.loc[:,startwk:endwk]
    df_date_extract.loc[:,'Row_Total'] = df_date_extract.sum(numeric_only=True, axis=1)
    df_max_item = df_date_extract[df_date_extract['Row_Total']==df_date_extract['Row_Total'].max()]
    return df_max_item.index[0]

I'm trying to group by item and then add a total to each row. Once I've done that I want to find the row with the largest total and return the item name as a string.

The data is structured....

Item	WK1	.....	Wk 50
item1	34	.....	45
item2	23	.....	67
item1	12	.....	5
item2	45	.....	12

The error....

......:125: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_date_extract.loc[:,'Row_Total'] = df_date_extract.sum(numeric_only=True, axis=1)

Where is this coming from? That looks like a style warning from a linter, not a python error. — Barmar, Jun 09 '23 at 20:32
It's what is displayed when I run my code in the VS Codes terminal — Neil White, Jun 09 '23 at 20:33
And the message tells you exactly which line it's warning about. It's 2 lines after the one you think it is. — Barmar, Jun 09 '23 at 20:33
Although I still got the error when I commented lines back to the line I thought it was originally — Neil White, Jun 09 '23 at 20:35

score 1 · Answer 1 · answered Jun 09 '23 at 21:18

You can try this simplified version :

def get_max_item(startwk, endwk):
    totals = (
        pd.read_csv("data.csv", index_col=0)
            .loc[:, startwk:endwk]
            .stack() #this make a Series
            .groupby(level=0).sum()
    )
    return list(*totals.loc[[totals.idxmax()]].items())

score 0 · Answer 2 · answered Jun 09 '23 at 20:49

Solved my problem. Created a new data frame to hold grouped item totals and used idxmax() and max() to get the values I wanted. returned an array instead:

def get_max_item(startwk, endwk):
  df1 = pd.read_csv("data.csv")
  df_grouped = df1.groupby(["Item"]).sum()
  df_date_extract = df_grouped.loc[:,startwk:endwk]
  df_totals = df_date_extract.sum(numeric_only=True, axis=1)
  most_popular_item = df_totals.idxmax()
  most_popular_item_total = df_totals.max()
  return [most_popular_item,most_popular_item_total]

Pandas - where have I 'Chained' assignment?

2 Answers2