I have a data frame where I want to replace the strings with values, but want to create a new column when doing so. Note: this is a small example, I have 81 columns and many are numerical.
A B C D
0 .5 Ex Ex Po
1 35 Gd TA Gd
2 52 TA Fa Ex
3 47 Bd Po Gd
I'm new to python so I can only do this into a new data frame using:
df_train_scaled = df_train.replace(['Ex','Gd','TA','Fa','Po'], [5, 4, 3, 2, 1])
This outputs the values in place of the string for the same column, but I want to create a new column with the replaced values like below ('original_col_name' + '_scl':
A B B_scl C C_scl D D_scl
0 .5 Ex 5 Ex 5 Po 1
1 35 Gd 4 TA 3 Gd 4
2 52 TA 3 Gd 4 Ex 5
3 47 Bd 2 Po 1 Gd 4
Ultimately, I would like to select the columns that include the ordinal variables (Ex, Gd, etc.) above, replace them with the scale values (1, 2, 3, etc.), then create new columns to use for modeling. It would also be helpful so that I can run on the validation data set or be prepared to run in production.
EDIT: This doesn't seem like a duplicate because I'm not simply trying to convert categorical data to numerical. I read the label encoder documentation and it just replaces a categorical value with a number. This is an ordinal set where each string represents a rank, which I'm converting to numbers.
Ex - Excellent = 5
Gd - Good = 4
TA - Average = 3
etc.