0

I have a table of data in .dta format which I have read into python using Pandas. The data is mostly in the categorical data type and I want to replace the columns with numerical data that can be used with machine learning, such as boolean (1/0) or codes. The trouble is that I can't directly replace the data because it won't let me change the categories, unless I add them.

I have tried using pd.get_dummies(), but it keeps returning an error:
TypeError: 'columns' is an invalid keyword argument for this function

print(pd.get_dummies(feature).head(), columns=['smkevr', 'cignow', 'dnnow', 
                                               'dnever', 'complst'])

Is there a simple way to replace this data with numerical codes based on the value (for example 'Not applicable' = 0)?

Ewan Jones
  • 11
  • 2
  • 1
    Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Jun 14 '17 at 12:02
  • You're passing `columns` to `print`, not `pd.get_dummies`. – BallpointBen Jun 14 '17 at 12:06

1 Answers1

0

I do it the following way:

df_dumm = pd.get_dummies(feature).head()
df_dumm.columns = ['smkevr', 'cignow', 'dnnow', 
                   'dnever', 'complst']
print (df_dumm.head())
Jeril
  • 7,858
  • 3
  • 52
  • 69