Please help me understand how to ensure I correctly handle this normalization. Suppose one has a dataset containing multiple classes, including:
- character
- Factor
- integer
- numeric
Re. 1 (character): let's say this is a column of locations: Dallas, Trenton, and Atlanta. To prepare for normalization, I make Dallas=1, Trenton=2, and Atlanta=3, thus converting this column to integer.
Re. 2 (factor): let's say this is a logical column stating whether salesperson is active: 1=yes, 0=no.
Re. 3a (integer): this column identifies salesperson rank: 1, 2, or 3.
Re. 3b (integer): this column tells how many people are on salespersons' teams: 1, 2, 3, 4...
Re. 4 (numeric): this column discloses sales totals, e.g., 50,000, 100strong text,000, etc.
My two questions I want to use min-max normalization, but I have two questions:
First, does it matter that I have various factor
and integer
classes in addition to numeric
? I do not want to undermine a regression model by getting this wrong.
And second, my aim is to perform min-max normalization using
min_max_norm <- function(x){
(x - min(x)) / (max(x) - min(x))
}
Is this appropriate with the data classes I have?