What's the easiest way to replace categorical columns of data with codes in Pandas?

Question

I have a table of data in .dta format which I have read into python using Pandas. The data is mostly in the categorical data type and I want to replace the columns with numerical data that can be used with machine learning, such as boolean (1/0) or codes. The trouble is that I can't directly replace the data because it won't let me change the categories, unless I add them.

I have tried using pd.get_dummies(), but it keeps returning an error:
TypeError: 'columns' is an invalid keyword argument for this function

print(pd.get_dummies(feature).head(), columns=['smkevr', 'cignow', 'dnnow', 
                                               'dnever', 'complst'])

Is there a simple way to replace this data with numerical codes based on the value (for example 'Not applicable' = 0)?

Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. — jezrael, Jun 14 '17 at 12:02

score 0 · Answer 1 · answered Jun 14 '17 at 12:15

0

I do it the following way:

df_dumm = pd.get_dummies(feature).head()
df_dumm.columns = ['smkevr', 'cignow', 'dnnow', 
                   'dnever', 'complst']
print (df_dumm.head())

answered Jun 14 '17 at 12:15

Jeril

7,858
3
52
69

What's the easiest way to replace categorical columns of data with codes in Pandas?

1 Answers1