So I have the following data:
>>> test = pd.Series([['a', 'b', 'e'], ['c', 'a'], ['d'], ['d'], ['e']])
>>> test
0 [a, b, e]
1 [c, a]
2 [d]
3 [d]
4 [e]
I am trying to one-hot-encode all of the data in the lists back into my dataframe. To look like this:
>>> pd.DataFrame([[1, 1, 0, 0, 1], [1, 0, 1, 0, 0],
[0, 0, 0, 1, 0], [0, 0, 0, 1, 0],
[0, 0, 0, 0, 1]],
columns = ['a', 'b', 'c', 'd', 'e'])
a b c d e
0 1 1 0 0 1
1 1 0 1 0 0
2 0 0 0 1 0
3 0 0 0 1 0
4 0 0 0 0 1
I have tried researching and I've found similar problems but none like this. I have attempted:
test.apply(pd.Series)
But that doesn't quite accomplish the one-hot aspect. That simply unpacks my lists in an arbitrary order. I'm sure I could figure out a lengthly solution but I'd be glad to hear if there's a more elegant way to perform this.
Thanks!
EDIT: I am aware that I can iterate through my test
series, then create a column for each unique value found, then go back and iterate through test
again, flagging said columns for unique values. But that doesn't seem very pandorable to me and I'm sure there's a more elegant way to do this.