2

My task is to drop all rows containing NaNs and encode all the categorical variables inside of data.

I wrote a function that looks like

def preprocess_data(data):

    data = data.dropna()
    le = LabelEncoder()
    data['car name'] = le.fit_transform(data['car name'])

    return data

which takes a dataframe and returns a processed data. Running this function gives me a warning that says:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I don't quite get which part of my code is causing this and how to fix it.

Dawn17
  • 7,825
  • 16
  • 57
  • 118

2 Answers2

1

Make sure you tell pandas that data it is its own data frame (and not a slice) by using:

def preprocess_data(data):

    data = data.dropna().copy()
    le = LabelEncoder()
    data['car name'] = le.fit_transform(data['car name'])

    return data

A more detailed explanation here: https://github.com/pandas-dev/pandas/issues/17476

0

Maybe you should give more information and / or the problem is not in the method. The following code does not produce warning.

def preprocess_data(data):

    data = data.dropna()
    le = preprocessing.LabelEncoder()
    data['car name'] = le.fit_transform(data['car name'])
    return data


preprocess_data(pd.DataFrame({'car name': ['nissan', 'dacia'], 'car mode': ['juke', 'logan']}))

#   car mode  car name
# 0     juke         1
# 1    logan         0
Romain
  • 19,910
  • 6
  • 56
  • 65