I have the following lists:
vocab = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
list1 = ['a', 'b', 'c', 'd', 'e']
list2 = ['f', 'g', 'h', 'i', 'j']
With the following code, I would like to get an encoding that creates a one-hot-encoding for list 1, but includes all the items from vocab.
import pandas as pd
encoding1 = pd.get_dummies(data= list1, columns= vocab)
encoding2 = pd.get_dummies(data= list2, columns= vocab)
I want the output:
encoding1 = a b c d e f g h i j
1 1 0 0 0 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0 0 0
3 0 0 1 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
encoding2 = a b c d e f g h i j
1 0 0 0 0 0 1 0 0 0 0
2 0 0 0 0 0 0 1 0 0 0
3 0 0 0 0 0 0 0 1 0 0
4 0 0 0 0 0 0 0 0 1 0
5 0 0 0 0 0 0 0 0 0 1
However, I get the output:
encoding1 = a b c d e
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 1 0 0
4 0 0 0 1 0
5 0 0 0 0 1
encoding2 = f g h i j
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 1 0 0
4 0 0 0 1 0
5 0 0 0 0 1
What can I do to get the desired output?