I have a list which contains a sentence, I want to perform one hot encode for a complete sentence in each word,
For Example,
sentences = [
"python, java",
"linux, windows, ubuntu",
"java, linux, ubuntu, windows",
"performance, python, mac"
]
I want output like this,
java linux mac performance python ubuntu windows
0 1 0 0 0 1 0 0
1 0 1 0 0 0 1 1
2 1 1 0 0 0 1 1
3 0 0 1 1 1 0 0
My attempt,
I tried to convert my sentences into series then perform get_dummies
but I'm getting for each word but not by sentence.
print pd.get_dummies(pd.Series(sum([tag.split(', ') for tag in sentences],[])))
O/P
java linux mac performance python ubuntu windows
0 0 0 0 0 1 0 0
1 1 0 0 0 0 0 0
2 0 1 0 0 0 0 0
3 0 0 0 0 0 0 1
4 0 0 0 0 0 1 0
5 1 0 0 0 0 0 0
6 0 1 0 0 0 0 0
7 0 0 0 0 0 1 0
8 0 0 0 0 0 0 1
9 0 0 0 1 0 0 0
10 0 0 0 0 1 0 0
11 0 0 1 0 0 0 0
How to solve this?