I have a DataFrame with multiple variables within each column - the datatype of these are strings not lists. I am unable to One-Hot Encode the data within each column.
Out:
A B C
Ella Red; Blue; Yellow Circle; Square; Triangle Small; Medium; Extra big
Mike Yellow; Red; Blue Oval; Triangle; Circle Medium; Big; Extra big
Dave Yellow; Red; Green Circle; Square; Triangle Extra small; Medium; Big
I am looking to create it with multi-tiered column headings to look like the below:
A B C
Red Blue Green Yellow Circle Triangle Square Oval ....
Ella 1 1 0 1 1 1 1 0 ....
Mike 1 1 0 1 1 1 0 1 ....
Dave 1 0 1 1 1 1 1 0 ....
I tried this, and it helped me but only works when all the columns have the same variables: https://stackoverflow.com/a/67110743/15646168
df = df.stack().str.get_dummies(sep=',')
df.columns = df.columns.str.strip()
df = df.stack().groupby(level=[0,1,2]).sum().unstack(level=[1,2])
Thank you so much!