0

I am trying to turn category columns into a set of columns each corresponding to a unique value in the original column with boolean values showing what the category is for that case.

My latest try involves this user defined function:

def cath_column (df, col):
    u_values = np.sort(df[col].unique())
    
    for v in u_values: 
        df[col+'.'+str(v)] = df[col] == v
    
    return df.copy()

And it seems to work in a timely manner, but the compiler gives a SettingWithCopyWarning. The warning's full text is:

/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

I've read the documentation but I am not sure if it applies in this situation, as I am not modifying values in a dataframe but adding columns. I tried using other ways to add copandas.concat with axis = 1 but it had very bad performance.

Is there a better way to do this or can the SettingWithCopyWarning be safely ignored?

ychvez
  • 1
  • What you are trying to do is called 'hot encoding' and there are many implementations of it, for example in sklearn: sklearn.preprocessing.OneHotEncoder – Roim Jun 29 '23 at 19:32

0 Answers0