SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame while clustering

Question

I do have a dataframe below and

df=pd.DataFrame({'month':['1','1','1','1','1','2','2','2','2','2','2','2'],'X1': 
[30,42,25,32,12,10,4,6,5,10,24,21],'X2':[10,76,100,23,65,94,67,24,67,54,87,81],'X3': 
[23,78,95,52,60,76,68,92,34,76,34,12]})  
df

This code for the above dataframe but this code throws me an error

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Code:

cols = df.columns[2:4]
mapping = {0: 'weak', 1: 'average', 2: 'best'}


def cluster(X):
k_means = KMeans(n_clusters=3).fit(X)
return X.groupby(k_means.labels_)\
        .transform('mean').sum(1)\
        .rank(method='dense').sub(1)\
        .astype(int).to_frame()

df['Cluster_id'] = df.groupby('month')[cols].apply(cluster)
df['Cluster_cat'] = df['Cluster_id'].map(mapping)

How can I fix this? Thank you.

Your code looks fine to me - I ran it and cannot reproduce the warning message. As far as I understand, this warning would only occur if you tried to set a slice of the DataFrame. For example, trying to apply your cluster function to a slice of the DataFrame where the month is '1' like this `df[df['month'] == '1']['Cluster_id'] = df[df['month'] == '1'].groupby('month')[cols].apply(cluster)` would clearly throw `SettingWithCopyWarning` — Derek O, May 04 '21 at 06:56

Derek O · Answer 1 · 2021-05-04T07:10:30.910

I ran your code and cannot reproduce the warning message.

This warning occurs if you try to incorrectly set a slice of the DataFrame, and pandas doesn't know whether you are trying to set the slice from the original DataFrame or a copy of a slice. A more in-depth explanation can be found here.

You would get the SettingWithCopyWarning if you applied your cluster function to part of the DataFrame where the month = '1' in the following way:

df[df['month'] == '1']['Cluster_id'] = df[df['month'] == '1'].groupby('month')[cols].apply(cluster)

The correct way to perform the above operation would be:

df.loc[df['month'] == '1', 'Cluster_id'] = df[df['month'] == '1'].groupby('month')[cols].apply(cluster)

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame while clustering

1 Answers1