I have 3 types of categorical data in my dataframe, df
.
df['Vehicles Owned'] = [1,2,3+,2,1,2,3+,2]
df['Sex'] = ['m','m','f','m','f','f','m','m']
df['Income'] = [42424,65326,54652,9463,9495,24685,52536,23535]
What should I do for the df['Vehicles Owned']
? (one hot encode, labelencode or leave it as is by converting 3+ to integer. I have used integer values as they are. looking for the suggestions as there is order)
for df['Sex']
, should I labelEncode it or One hot? ( as there is no order, I have used One Hot Encoding)
df['Income']
has lots of variations. so should I convert it to bins and use One Hot Encoding explaining low
,medium
,high
incomes?