2

I have a data frame with n columns like the one below with all the columns being numeric (ex. below only has 3, but the actual one has an unknown number).

col_1 col_2 col_3 
1      3     7   
3      8     9   
5      5     2 
8      10    1
11     9     2 

I'm trying to transform the data on every column based on this equation: (x-min(col)/(max(col)-min(col)) so that every element is scaled based on the values in the column.

Is there a way to do this without using a for loop to iterate through every column? Would sapply or tapply work here?

Nakul Upadhya
  • 494
  • 4
  • 16

2 Answers2

10

We can use scale on the dataset

scale(df1)

Or if we want to use a custom function, create the function, loop over the columns with lapply, apply the function and assign it back to the dataframe

f1 <- function(x) (x-min(col)/(max(col)-min(col))
df1[] <- lapply(df1, f1)

Or this can be done with mutate_all

library(dplyr) 
df1 %>%
    mutate_all(f1)
akrun
  • 874,273
  • 37
  • 540
  • 662
5

In complement to @akrun answer, you can also do that using data.table

library(data.table)
setDT(df)
df[,lapply(.SD, function(x) return((x-min(col)/(max(col)-min(col)))]

If you want to use a subset of columns, you can use .SDcols argument, e.g.

library(data.table)
df[,lapply(.SD, function(x) return((x-min(col)/(max(col)-min(col))),
.SDcols = c('a','b')]
linog
  • 5,786
  • 3
  • 14
  • 28