4

I transformed data to attend to the requirements of a linear model (normally distributed):

d.reg1 = d.reg %>% preProcess("YeoJohnson") %>% predict(d.reg) 

The adjusted model:

fit = lm(log10(Qmld)~log10(Peq750), data = d.reg1) #potential regression

Predicted data:

a=10^fit$coefficients[1]
b=fit$coefficients[2]

d.reg1$Qmld_predita=a*d.reg1$Peq750^b 

How could I untransform d.reg1$Qmld_predita, since the model was fitted to transformed data and this has no physical significance for me?

Mihai Chelaru
  • 7,614
  • 14
  • 45
  • 51
  • 1
    I don't understand what you are asking. Are you asking what's the inverse of the `log10()` function? I'm also not sure why you are not using `predict()` with your adjusted model as well. Perhaps it would help if you included a proper [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we can run the code and test it. – MrFlick Jul 07 '17 at 14:27
  • Also keep in mind that the normal distribution assumption is typically on the error term (which we check using the residuals) so you really shouldn't check that until after you fit the model. – Dason Jul 07 '17 at 14:46

2 Answers2

3

Here's a model for a function that could be modified based on the initial transformations chosen (e.g. here the initial transformations were c("scale", "center").

library(tidyverse)

revPredict <- function(preproc, data, digits=0) {
  data %>%
    select(one_of(preproc$mean %>% names)) %>%
    map2_df(preproc$std, ., function(sig, dat) dat * sig) %>%
    map2_df(preproc$mean, ., function(mu, dat) dat + mu)
}

revPredict(preprocess_params, df_needing_reverse_transformation)

Since it's been more than 6 months since the question was asked, I assume you've figured a way around this, but it may still be of interest given the similar question being here, too.


To round values, pipe the output of the second map2_df to this:

    mutate_if(is.numeric,funs(round(.,digits = digits)))
saladi
  • 3,103
  • 6
  • 36
  • 61
  • So caret doesn't apply pre-processing to each training set independently? I thought this was necessary to avoid information leakage, but from your post it seems that all data is scaled and centered together. I assumed there would be one mean and one sd for each variable AND each training/test set. – Giovanni Colitti Jul 24 '19 at 16:11
  • Not sure how it would work if you pre-processed your training and testing separately, but with this example SO, the preprocessing parameters are calculated once for `d.reg` and then can be used. – saladi Jul 25 '19 at 01:23
0

Here is another addition, if you are scaling to 0-1 you can use this to inverse transform it. Useful for deep learning

revPredict <- function(preproc, data,digits=0,range = F) {
   if (range == T){
     data<-data %>%
       select(one_of(dimnames(preproc$ranges)[[2]])) %>%
       map2_df(preproc$ranges[2,]-preproc$ranges[1,], ., function(min_max, dat) min_max* dat)  %>%
       map2_df(preproc$ranges[1,], ., function(min, dat) min + dat) %>%
      mutate_if(is.numeric,funs(round(.,digits = digits)))
    return(data)
    }
  data<- data %>%
    select(one_of(names(preproc$mean))) %>%
    map2_df(preproc$std, ., function(sig, dat) dat * sig)  %>%
    map2_df(preproc$mean, ., function(mu, dat) dat + mu) %>%
    mutate_if(is.numeric,funs(round(.,digits = digits)))
  return(data)
}
Andrew Troiano
  • 187
  • 1
  • 8