0
workclass = X_train[~X_train['workclass'].isnull()]['workclass'].unique()
for dataset in [X_train, X_test]:
    df = dataset[dataset['workclass'].isnull()].index
    size = len(df)
    s = pd.Series([workclass[np.random.randint(0, 8)] for _ in range(size)], index=df, dtype=object)
    dataset.loc[:, 'workclass'] = dataset.loc[:, 'workclass'].fillna(s)

Output

S:\AnacondaPF\lib\site-packages\pandas\core\indexing.py:965: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

The last line is giving me SettingWithCopyWarning even if i use the .loc method. Even it is giving the warning it has filled all the missing values in the two datasets.

Can anyone explanin why?

Anonymous
  • 75
  • 1
  • 8

1 Answers1

1

I think you should have used the train_test_split from sklearn so to split the data before.

The pandas will raise the SettingWithCopyWarning warning if it is not sure that the given DataFrame you are changing is either a copy or the original DataFrame.

The causes for the SettingWithCopyWarning are explained in more detail here and the issue.

You can just take the warning like a false positive warning or you can suppress the warning by doing this:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y) # X = features y = labels
X_train, X_test = X_train.copy(), X_test.copy()
Sathwick
  • 26
  • 2