-1

I would like to use normalization for 0 until 1 using this function:

range01 <- function(x){(x-min(x))/(max(x)-min(x))}

if I use it for single case it works fine

However I would like to use it for a whole dataframe in which it find the min and max of all columns and makes the normalization I used this option:

data.frame(apply(df[2:ncol(df)], 2, range01))

However it doesn't give the exepected results. Any idea if the apply should be different?

Nathalie
  • 1,228
  • 7
  • 20
  • 1
    You are currently applying it in each column separately. To do it on the data frame as a whole, simply do `range01(df[-1])` – Sotos Feb 07 '20 at 11:56
  • Related https://stackoverflow.com/q/5468280/680068 – zx8754 Feb 07 '20 at 12:00
  • Also, `data.frame(apply(...))` is a bad idea, much better is `df[-1] <- range01(df[-1])`. – Rui Barradas Feb 07 '20 at 12:03
  • Could you post a data sample? In my mind what could be going wrong is that using the apply function would generate a matrix and then you're trying to coerce it to a data frame. But I'm not sure without any sample data. – Luís Telles Feb 07 '20 at 12:16
  • It would also be nice to specify how exactly the output is not matching your expectations. – Luís Telles Feb 07 '20 at 12:16

2 Answers2

2

Here is another option. I always like the mutate_at and mutate_all functions from dplyr to apply functions across different columns.

#your function
range01 <- function(x){(x-min(x))/(max(x)-min(x))}

#some data
set.seed(1)
df <- data.frame(a = runif(10,1,5), b = runif(10,2,10))


library(dplyr)
mutate_all(df, range01)
#>            a          b
#> 1  0.2307452 0.03608002
#> 2  0.3515024 0.00000000
#> 3  0.5788577 0.62607041
#> 4  0.9586953 0.25454974
#> 5  0.1584522 0.72764475
#> 6  0.9475749 0.39387104
#> 7  1.0000000 0.66359501
#> 8  0.6784675 1.00000000
#> 9  0.6425811 0.24955981
#> 10 0.0000000 0.73697057
AndS.
  • 7,748
  • 2
  • 12
  • 17
1

Maybe you can try the code below, where you define a custom function normalize, i.e.,

normalize <- Vectorize(function(v) (v-min(v))/diff(range(v)))
dfout <- data.frame(normalize(df))

Example

set.seed(1)
df <- data.frame(a = runif(10,1,5), b = runif(10,2,10))

> df
          a        b
1  2.062035 3.647797
2  2.488496 3.412454
3  3.291413 7.496183
4  4.632831 5.072830
5  1.806728 8.158731
6  4.593559 5.981594
7  4.778701 7.740948
8  3.643191 9.935249
9  3.516456 5.040281
10 1.247145 8.219562

and then you will get

> dfout
           a          b
1  0.2307452 0.03608002
2  0.3515024 0.00000000
3  0.5788577 0.62607041
4  0.9586953 0.25454974
5  0.1584522 0.72764475
6  0.9475749 0.39387104
7  1.0000000 0.66359501
8  0.6784675 1.00000000
9  0.6425811 0.24955981
10 0.0000000 0.73697057
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81