0

I have seen those solutions:

  1. Scaling a numeric matrix in R with values 0 to 1
  2. Range standardization (0 to 1) in R

However the method is not able to work if there are NA values present.

I have try this:

s = append(sort(rexp(100)),rep(NA,30))
o = data.frame(s,s)

range01 <- function(x){
    if (!is.na(x))
    { 
        return(NA)
                    }
    else{
        y =  (x-min(x))/(max(x)-min(x))
        return(y)}

}

xo = apply(o,MARGIN = 2, FUN = range01)

But it doesn't work... Suggestions?

The solution should works on dataframes by apply function

Community
  • 1
  • 1
Kaervas
  • 103
  • 2
  • 16
  • 3
    Shouldn't `if (!is.na(x))` be `if (is.na(x))`? Otherwise you are are returning `NA` for all non-`NA` input, and trying to do calculations on `NA` input. – nrussell Aug 10 '15 at 17:55
  • @nrussell You are correct. `!is.na()` returns values that aren't `NA` because of the not operator `!`. – N8TRO Aug 10 '15 at 17:57
  • i cannot use na.rm = True, because i have to keep the dataframe as he is... but... thanks! – Kaervas Aug 10 '15 at 18:01
  • 1
    @Kaervas Don't edit your question with the answer. Either accept the answer below. Or if that does not work, add your own answer below and accept it. The community should vote on what it thinks is the best answer. – MrFlick Aug 10 '15 at 18:18

1 Answers1

7

Here's the answer from the 2nd question you link:

function(x) {(x - min(x)) / (max(x) - min(x))}

We can modify this to work with NAs (using the built-in NA handling in min and max

stdize = function(x, ...) {(x - min(x, ...)) / (max(x, ...) - min(x, ...))}

Then you can call it and pass through na.rm = T.

x = rexp(100)
x[sample(1:100, size = 10)] <- NA
stdize(x)  # lots of NA
stdize(x, na.rm = T) # works!

Or, using the o data frame from your question:

o_std = lapply(o, stdize, na.rm = T)

The NAs will still be there at the end.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • It works for a vector... but not for a dataset... try!: `stdize = function(x, ...) {(x - min(x, ...)) / (max(x, ...) - min(x, ...))} x = rexp(100) x[sample(1:100, size = 10)] <- NA x = data.frame(x,x) xo = apply(x,MARGIN = 2, FUN = stdize) ` – Kaervas Aug 10 '15 at 18:06
  • Added example, you still have to specify `na.rm = T`. You can pass it through `apply` or `lapply`. – Gregor Thomas Aug 10 '15 at 18:07