Is there a way of applying one-hot coding to both strings and integers at the same time? DictVectorizer is used for strings, OneHotEncoder is used for integers. Is there something that kind of combines them (treat all feature values as categorical regardless of their type)?
For Example: I have a pandas DataFrame, some of the columns are integers and some are strings:
>>> df
a b c d
0 2 0 w K
1 0 1 f K
2 1 2 y L
3 0 0 f M
All columns are actually categorical. There's no meaning for some of them being integers. Now if I use a DictVectorizer like this:
vectorizer = DictVectorizer(sparse=False)
df_dict = df.T.to_dict().values()
vectorizer.fit_transform(df_dict)
I get a nice big matrix for columns 'c' and 'd', but the values in 'a' and 'b' stay exactly the same. I need them to get the same action. One option is of course applying the str function on 'a' and 'b' but that's both implicit (the original data is always integers) and not efficient (iterating over all the column, which might be quite big and applying a wasteful task..).
Is there a simple way of doing this?
Thanks