0

I have 2 functions to standardize my data such as:

def standartChanger(dataFrame):
    stdSc = preprocessing.StandardScaler()
    cols = dataFrame.columns
    dfscaled = stdSc.fit_transform(dataFrame)
    dfscaled = pd.DataFrame(dfscaled, columns=cols)
    return dfscaled, stdSc

def standartChangerwithMeanVar(dataFrame,stdSc):
    cols = dataFrame.columns
    dataFrame = stdSc.transform(dataFrame)
    dfscaled = pd.DataFrame(dataFrame, columns=cols)
    return dfscaled

One to standardize train and other for test. I have some dummy variables in my data-frame which I don't want to standardize. Now I need to modify these functions to not touch the dummy variables which are 0-1. How can I do that?

Also, in linear regression I have an issue that my dummy variables' coefficients are too large which creates senseless points in predictions. Do you have any idea for that?

Kevin Hernandez
  • 1,270
  • 2
  • 19
  • 41
DenizT
  • 11
  • 3
  • I provided some info, however if you have set your mind to using StandarScaler with dummy variables here is a question that covers that issue : https://stackoverflow.com/questions/37685412/avoid-scaling-binary-columns-in-sci-kit-learn-standsardscaler – Andres Ordorica Aug 05 '20 at 14:53

1 Answers1

0
  1. I would suggest using the MinMaxScaler, as it would leave your dummy variables as they are since they are already "normalized".

  2. If you are performing a regression problem, you should maybe try to do a model that penalizes large coefficients such as Lasso. Which provides valuable information on the features. In Lasso regression you have a hyperparameter to tune : alpha, just as a quick note the larger the alpha the more the model penalises large coeffcients, if you set alpha = 0, you get back Linear Regression. Lasso is like the normal regression, but with the alpha that makes more reasonable estimations and avoids large coefficients.

Andres Ordorica
  • 302
  • 1
  • 5