My data:
Rank Platforms Technology
high Windows||Linux Unity
high Linux
low Windows Unreal
low Linux||MacOs GameMakerStudio||Unity||Unreal
low GameMakerStudio
low
I want to convert it to something like this:
Rank platform_Windows platform_linux platform_MacOs technology_unity technology_unreal technology_GameMakerStudio
high 1 0 0 1 0 1
high 0 1 0 0 0 0
low 1 0 0 0 1 0
low 0 1 1 1 1 1
low 0 0 0 0 0 1
low 0 0 0 0 0 0
So it's sort of one-hot encoding. I have followed many answers:
- How to one-hot-encode from a pandas column containing a list?
- Pandas get_dummies to create one hot with separator = ' ' and with character level separation [duplicate]
- ow to one-hot-encode from a pandas column containing a list?
The issues are:
- none of them shows how to separate my list by
||
delimiter - none of them shows how to prefix the new column name. For example
platform_
andtechnology_
. I need this to know which original column the new column comes from.
My current code is:
df.drop('Platforms', 1).join(
pd.get_dummies(
pd.DataFrame(df.Platforms.str.split("||").tolist()).stack(),
prefix=['platform']
).assum(level=0)
)
df.drop('Technology', 1).join(
pd.get_dummies(
pd.DataFrame(df.Technology.str.split("||").tolist()).stack(),
prefix=['technology']
).assum(level=0)
)
But the error I get is:
TypeError: object of type 'float' has no len()
I have read the document pandas.get_dummies and pandas.Series.str.get_dummies. The latter seems to accept a customized delimiter while the former allows customized new column prefixes...