3

I am fairly new to data science. I am working on use-case of predicting sales demand using linear regression based on product no and store no as predictor. There can be many stores and products with numeric values. Do I need to standardize or scales these variables/predictors if theirs values are numeric, unbounded and at different scale? I believe if I try to use interaction term I will have standardize it?

avani jain
  • 99
  • 1
  • 8
  • 1
    As Hakan said, you'll have to encode and then normalize if you want all models to be available. Depending on the kind of Categories you have, you can either use One Hot Encoder, Target Encoder or some manual encoding. More information here : https://datascience.stackexchange.com/a/97949/101580 – Adept Aug 05 '21 at 10:05

1 Answers1

2

Since these are categorical features, before using linear models you should encode this correctly to create a reasonable model. If you can encode these categorical features to give them linear correlation, then you can standardize it otherwise it wouldn't make sense. If you use tree-based models then you don't have to encode since they are able to discover nonlinear relationships.

Edit-note: You can try to use methods of mean-encodings. Methods like CV loop, Expanding mean, etc.

Hakan Akgün
  • 872
  • 5
  • 13