0

I have a model where I scaled a variable, x, then used the scaled x and the square of scaled x as predictors. I.e., lm(y ~ I(scale(x)) + I(scale(x)^2). I want to get the coefficients to be applicable to the original x units, but I'm having a hard time figuring out how to do this. I think the trouble is because of squaring after scaling, but I don't know.

This answer got me quite close, I think. Again, the difference might be the squaring after scaling.

Below is an R script that shows my process in creating fake data and trying to unscale. Sorry, I guess it's a little long.

What's peculiar is that at the end, I get the rescaling right for the squared terms, but not the linear terms or the intercept. Given that I used the method from the aforementioned answer, I'd have guessed I would get the squared coefficients wrong!

How do I 'unscale' the parameters?

set.seed(1)
# ---- Create Covariates ----
# Define dimensions for simulation
n <- 100
# Covariate Simulation Parameters
mean.temp <- 10
sd.temp <- 5
mean.depth <- 200
sd.depth <- 20
# Simulate Covariates
temperature <- rnorm(n=n, mean=mean.temp, sd=sd.temp)
depth <- rnorm(n=n, mean=mean.depth, sd=sd.depth)
# Create Unscaled Data Matrix
dmat <- data.frame( 
    temp=temperature, temp2=temperature^2, 
    depth=depth, depth2=depth^2
)
# Scale Covariates
temp.scale <- scale(temperature)
depth.scale <- scale(depth)
# Create Scaled Data Matrix
dmat.scale <- data.frame(
    temp=temp.scale, temp2=temp.scale^2, 
    depth=depth.scale, depth2=depth.scale^2
)
# Record Scaling Factors
mu.vec <- c(
    "temp.mu"=attr(temp.scale,'scaled:center'),
    "temp2.mu"= attr(temp.scale,'scaled:center')^2, # is this right?
    "depth.mu"=attr(depth.scale,'scaled:center'),
    "depth2.mu"= attr(depth.scale,'scaled:center')^2 # is this right?
)
sd.vec <- c(
    "temp.sd"=attr(temp.scale,'scaled:scale'),
    "temp2.sd"= attr(temp.scale,'scaled:scale')^2,
    "depth.sd"=attr(depth.scale,'scaled:scale'),
    "depth2.sd"= attr(depth.scale,'scaled:scale')^2
)
# ---- Create Parameters ----
beta <- matrix(c(0.5, 0.1, -0.8, 1.2, -0.1),ncol=1)
# ---- Simulate ----
eps <- rnorm(n=n, mean=0, sd=0.001)
y <- (cbind(1,as.matrix(dmat))%*%beta + eps)[,1]
# ---- Fit Models ----
mod <- lm(y~as.matrix(dmat))
mod.scale <- lm(y~as.matrix(dmat.scale))

beta.orig <- coef(mod)
beta.scale <- coef(mod.scale)
# ---- Rescale Function ----
# From: https://stackoverflow.com/a/23643740/2343633
rescale.coefs <- function(beta,mu,sigma) {
    beta2 <- beta ## inherit names etc.
    beta2[-1] <- sigma[1]*beta[-1]/sigma[-1]
    beta2[1]  <- sigma[1]*beta[1]+mu[1]-sum(beta2[-1]*mu[-1])
    beta2
}
beta.rescale <- rescale.coefs(beta.scale, mu=c(0,mu.vec), sigma=c(1,sd.vec))

# ---- Compare ----
beta.orig
beta.rescale
Community
  • 1
  • 1
rbatt
  • 4,677
  • 4
  • 23
  • 41

1 Answers1

0

Okay, got it. The intercept should just be 1 value, the other arguments can be vectors of the same length with corresponding elements associated with the same covariate (e.g., first element of each is associated with temperature or temperature squared, second with depth).

Rescaling the quadratic term is the easiest, you just divide by the sd()^2. Previously I was only doing the first term for the linear coefficient unscaling, and the second term there involves a lot more. I had a lot wrong in the intercept rescaling (missing the entire third term, didn't divide by sigma in the second term).

unscale <- function(intercept, betaLinear, betaQuad, mu, sigma){
    i2 <- intercept + sum(-(betaLinear*mu/sigma) + betaQuad*mu^2/sigma^2)
    bl2 <- betaLinear/sigma - 2*betaQuad*mu/sigma^2
    bq2 <- betaQuad/sigma^2
    return(c(i2, bl2, bq2))
}
beta.unscale <- unscale(
    intercept = beta.scale[1], 
    betaLinear = beta.scale[c(2,4)], 
    betaQuad = beta.scale[c(3,5)], 
    mu = mu.vec[c(1,3)], 
    sigma = sd.vec[c(1,3)]
)[names(beta.scale)]
beta.orig
beta.unscale

Edit

Or, if you want a function that works quickly when you have thousands of each parameter, here's the vectorized version:

unscale <- function(a1, a2, a3, a4, a5, mu1, mu2, sigma1, sigma2){
    # i2 <- intercept + sum(-(betaLinear*mu/sigma) + betaQuad*mu^2/sigma^2)
    i2 <- a1 - a2*mu1/sigma1 - a4*mu2/sigma2 + a3*mu1^2/sigma1^2 + a5*mu2^2/sigma2^2
    # bl2 <- betaLinear/sigma - 2*betaQuad*mu/sigma^2
    a22 <- a2/sigma1 - 2*a3*mu1/sigma1^2
    a42 <- a4/sigma2 - 2*a5*mu2/sigma2^2
    # bq2 <- betaQuad/sigma^2
    a32 <- a3/sigma1^2
    a52 <- a5/sigma2^2

    if(length(a1)==1){
       return(c(i2, a22, a32, a42, a52))
    }else{
        return(cbind(i2, a22, a32, a42, a52))
    }
}
rbatt
  • 4,677
  • 4
  • 23
  • 41