1

I have a data frame where I want to replace the strings with values, but want to create a new column when doing so. Note: this is a small example, I have 81 columns and many are numerical.

       A    B     C     D
0     .5   Ex    Ex    Po
1     35   Gd    TA    Gd
2     52   TA    Fa    Ex
3     47   Bd    Po    Gd

I'm new to python so I can only do this into a new data frame using:

df_train_scaled = df_train.replace(['Ex','Gd','TA','Fa','Po'], [5, 4, 3, 2, 1])

This outputs the values in place of the string for the same column, but I want to create a new column with the replaced values like below ('original_col_name' + '_scl':

      A    B   B_scl   C   C_scl   D   D_scl
0     .5   Ex    5     Ex    5     Po    1
1     35   Gd    4     TA    3     Gd    4
2     52   TA    3     Gd    4     Ex    5
3     47   Bd    2     Po    1     Gd    4

Ultimately, I would like to select the columns that include the ordinal variables (Ex, Gd, etc.) above, replace them with the scale values (1, 2, 3, etc.), then create new columns to use for modeling. It would also be helpful so that I can run on the validation data set or be prepared to run in production.

EDIT: This doesn't seem like a duplicate because I'm not simply trying to convert categorical data to numerical. I read the label encoder documentation and it just replaces a categorical value with a number. This is an ordinal set where each string represents a rank, which I'm converting to numbers.

Ex - Excellent = 5
Gd - Good = 4
TA - Average = 3
etc.
CGermain
  • 369
  • 7
  • 18
  • where do the numbers come from? I see that `Avg` turns into 3? but I don't see `Avg` in your replace – MattR Feb 21 '18 at 21:36
  • Sorry, I need to edit that. Avg is actually TA. It's a set of variables ranking quality for various attributes using the same scale. I'm converting to numeric to see if it improves model accuracy. – CGermain Feb 21 '18 at 22:31

0 Answers0