0

Please help me understand how to ensure I correctly handle this normalization. Suppose one has a dataset containing multiple classes, including:

  1. character
  2. Factor
  3. integer
  4. numeric

Re. 1 (character): let's say this is a column of locations: Dallas, Trenton, and Atlanta. To prepare for normalization, I make Dallas=1, Trenton=2, and Atlanta=3, thus converting this column to integer.

Re. 2 (factor): let's say this is a logical column stating whether salesperson is active: 1=yes, 0=no.

Re. 3a (integer): this column identifies salesperson rank: 1, 2, or 3.

Re. 3b (integer): this column tells how many people are on salespersons' teams: 1, 2, 3, 4...

Re. 4 (numeric): this column discloses sales totals, e.g., 50,000, 100strong text,000, etc.

My two questions I want to use min-max normalization, but I have two questions:

First, does it matter that I have various factor and integer classes in addition to numeric? I do not want to undermine a regression model by getting this wrong.

And second, my aim is to perform min-max normalization using

min_max_norm <- function(x){
  (x - min(x)) / (max(x) - min(x))
  }

Is this appropriate with the data classes I have?

RKeithL
  • 157
  • 1
  • 9
  • 1
    why do you want to normalise the data? it makes no sense to normalise your categorical variables, or even to treat them as numerics. – George Savva May 05 '22 at 16:30
  • Working toward a predictive model, want to balance significance of predictors – RKeithL May 05 '22 at 16:32
  • how would that help a predictive model? your model fit and predictions will be the same however you scale the predictors. and again, you shouldn't be treating your categorical predictors as numeric – George Savva May 05 '22 at 16:33
  • Thanks for your replies, @GeorgeSavva. If I don't convert "Dallas" to numeric, how will a lm() incorporate that value in its predictions? I wonder if you could elaborate a bit on "it makes no sense," because it seems commonplace to use 1s and 0s for at least some categoricals. – RKeithL May 05 '22 at 16:42
  • 1
    lm will automatically expand factor/character predictors into a series of binary (0/1) variables – George Savva May 05 '22 at 16:49
  • 1
    Search for "[r] lm categorical" eg. https://stackoverflow.com/questions/30159162/linear-model-with-categorical-variables-in-r – Jon Spring May 05 '22 at 16:49
  • 1
    `model.matrix()` will convert categorical fields into columns of binary data for you. You have to remove the last column, though (or you'll have correlating fields). This is how you'd use that function, if your data frame was `d` and your categorical field was named `sector`: `Xsec = cbind(model.matrix(~d$sector + 0))[,-1]` – Kat May 08 '22 at 06:54

0 Answers0