0
#Here i scaled from 1-10 to 0-1 but i couldn't do logaritmic transformation
train[c(12)] <-lapply(train[c(12)], function(x){(x-min(x))/(max(x)-min(x))})
test[c(12)] <-lapply(test[c(12)], function(x){(x-min(x))/(max(x)-min(x))})

head(train)

Can you please help me? I need to do the log(x/(1-x))

  • 2
    What part didn't work? What do you intend to do with infinite values, which this equation will get at x = 1? – camille Jan 20 '22 at 16:35
  • It’s for a ridge and linear regression, this is my target variable and the professor told me that I should first scale the variable from 0 to 1 and he told me to do this log transformation but I didn’t understand why, it was something about scaling from 0 to 1 to -infinty and plus infinity. I don’t really remember what he told me but he said something like that – Jona Cardamone Jan 20 '22 at 16:40
  • After this is done, running `lapply(train[c(12)], function(x) log(x/(1-x)))` works as expected, not sure what issues you're seeing. Please consider making this question reproducible, including sample data, any warnings/errors you may be seeing, and your expected output. (For discussions on including sample data in pristine formats, see https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info, and then use `dput(.)`. Thanks) – r2evans Jan 20 '22 at 16:40
  • 1
    As to "why" you should do a log transform, that is best answered in an academic forum (such as [stats.se], in the classroom, or in office-hours with the prof), and is well-informed by *context* (such as: what are the values? what do you intend to do with the log-transforms?) Note that while StackOverflow has a wide scope in its current use, it is really not intended to be hypothetical or academic. – r2evans Jan 20 '22 at 16:42
  • There might be a misunderstanding here. Logistic regression is : `log(p/(1-p)) =b_0+b_1x_1+...+b_nx_n` with `p` the probability that `y=1`, `b_i` the i:th regression coefficent and `x_i` the i:th regressor. When solving the logistic regression problem, however, one solves `p/(1-p)=exp(b_0+...b_nx_n)`. Still, `y` should not be log transformed, and the right hand-side is transformed to represent `p(y=1)` (and not `y` itself) after fitting the regression coefficients. – Baraliuh Jan 20 '22 at 18:01

1 Answers1

0

I'm not sure what your functions do to rescale, but in general to rescale a function you can combine your formulas into one equation:

  (old_variable - old_min) / (old_max - old_min) = (new_variable - new_min) / (new_max - new_min)

You can then solve for what you need, using the values you indicated

(old_variable - 1)/(10 - 1) =  (new_variable - 0) / (1 - 0)
(old_variable/(10-1))*1 = new_variable

Express as a function:

scale_function <- function(old_variable){
    (old_variable/(10-1))*1 
}

You want to log transform, so you can use results of the above function:

log(scale_function(old_variable) / (1 - scale_function(old_variable))

So, you can output the scale_function, then plug into this new log function, create a log function that uses scale_function as an argument, or incorporate the log function into the scale_function, which is here:

scale_log_function <- function(old_variable){
    log(((old_variable/(10-1))*1) / (1 - (old_variable/(10-1))*1))
}

Then it looks like you want to input a column from your data. I don't think you want to lapply it because it looks like you want to return the data in that same column and is not needed, so I think you'd put your vector into scale_log_function:

test[c(12)] <- scale_log_function(test[c(12)])

Of course, as others noted, you'll have to deal with problem cases, like log 0.

Brian Syzdek
  • 873
  • 6
  • 10
  • Thank you for the answer! How do i deal with log(0)? How can i remove them in order to not to be considered in the ridge and linear regression? – Jona Cardamone Jan 20 '22 at 17:11
  • I'd recommend that you consider what could give you exceptional cases, like log(0), and see what kind of output you get. 0's and values outside your expected range are good to test, like: ```scale_log_function(c(0, -5, 11, 5))``` gives ```[1] -Inf NaN NaN 0.2231436```. You may want to filter the data you're using when you get results you don't want to include. – Brian Syzdek Jan 20 '22 at 17:27