I have a dataframe in pandas of categorical data. Each column represents one timestep and each row one individual.
The dataset consists on the location of the individual on a given timestep. Locations are of this kind: 1)The person is picking someone 2) The person is at home 3) Person is at work, etc. There are 13 different locations
There are 720 columns representing timesteps of 2 mins (one full day in total).
I would like to transform these categorical locations into one-hot vectors. The problem is that a single row, may not contain the 13 different variables, so each row is encoded differently without a standard rule for all of them.
I also tried to do it manually:
old_l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
new_l = [[1,0,0,0,0,0,0,0,0,0,0,0,0], [0,1,0,0,0,0,0,0,0,0,0,0,0], [0,0,1,0,0,0,0,0,0,0,0,0,0], [0,0,0,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,1,0,0,0,0,0,0,0,0], [0,0,0,0,0,1,0,0,0,0,0,0,0], [0,0,0,0,0,0,1,0,0,0,0,0,0], [0,0,0,0,0,0,0,1,0,0,0,0,0],
[0,0,0,0,0,0,0,0,1,0,0,0,0], [0,0,0,0,0,0,0,0,0,1,0,0,0], [0,0,0,0,0,0,0,0,0,0,1,0,0], [0,0,0,0,0,0,0,0,0,0,0,1,0],
[0,0,0,0,0,0,0,0,0,0,0,0,1]]
df.replace(old_l, new_l,
inplace=True)
But I get the error ValueError: cannot assign mismatch length to masked array
Does anyone know a way to do this?
Thanks!