0

I've just started working with R to do my data manipulation and analysis after years of using IgorPro, which no one in their right mind would spend as much time writing scripts in as I have. There's clearly a conceptual disconnect between the two that's causing me trouble, though.

I want to write a function that will take whatever column in a dataframe I feed it and scale it from 0 to 1. The critical thing here is that I want the rescaled data to wind up IN the dataframe. In my IgorPro frame of mind, this is easy:

normalize<-function(col){
   col<-col/min(col)
}

If I put in testdf$testcol, and print the result, this has worked, but the results are not incorporated into the dataframe. A little research suggests that this is because my function exists in a local environment, and in order to modify things outside the local environment, it needs to be connected to the global environment.

Modified:

normalize<-function(col){
  col<-col/min(col)
  assign("col",col,envir=.GlobalEnv)
}

But of course this just spits out a new vector named col and doesn't help me in my endeavor to overwrite the non-scaled data.

Short of reassigning the column name to the rescaled data, which defeats the point of writing a function to do this, how can I use the arguments in the function to assign the function output to actual dataframes?

Final note: I appreciate any input that involves using packages that would do this for me, but I have a lot more data manipulation to do, and I'd like to be able to write my own functions rather than having to find packages for everything, so bonus points if you can help me understand how to write the function myself rather than pointing me to built-in functions elsewhere.

catalandres
  • 1,149
  • 8
  • 20
Clare
  • 69
  • 1
  • 13
  • you could pass the entire data frame and the column index (or name) to the function, and perform the calculation – Barranka Jun 07 '16 at 19:22
  • 1
    Your `normalize` function will not scale your data to [0,1], but rather to [1, infinity] – alexwhitworth Jun 07 '16 at 20:33
  • You appear to be confused about pass-by-reference semantics vs pass-by-value. [R does not pass-by-reference](http://stackoverflow.com/questions/2603184/r-pass-by-reference) – alexwhitworth Jun 07 '16 at 20:41
  • 1
    Using min rather than max because the instrument I'm using arbitrarily spits out all the data as negative values, so I'm flipping and re-scaling at the same time. – Clare Jun 08 '16 at 15:48

3 Answers3

2

Here is a typical base R method for applying functions to multiple columns of a dataset. Say you have a data.frame df, and you want to scale all vectors:

normalize <- function(x) x / min(x)

Now use lapply to run through your data.frame:

df[] <- lapply(df, normalize)

Note that you need the [] to maintain the data.frame structure. Now, suppose you have some categorical variables, that you don't want to touch

df[, sapply(df, is.numeric)] <- lapply(df[, sapply(df, is.numeric)], normalize)

Or to apply the function to a selected set of variables:

df[, c("var1", "var2", "var5")] <- lapply(df[, c("var1", "var2", "var5")], normalize)

A popular package that might be worth checking out is data.table. It can be a lot faster than base R for many tasks.

Here is one method to do this in data.table:

library(data.table)
setDT(df)

df[, names(df) := lapply(.SD, normalize)]
lmo
  • 37,904
  • 9
  • 56
  • 69
2

Another popular approach is using the dplyr package:

df <- df %>% mutate(col = col / min(col)

would replace col in the dataframe df. Another (base R) option is to use transform:

df <- transform(df, col = col / min(col))

although this is intended mainly for interactive use, it's not recommended for use in functions.

David_B
  • 926
  • 5
  • 7
2

All the other answers got it right about how to standardize the column, but here is something you really need to know, beyond the particular solution to what you want to do in this case.

The essential answer to why your code does not work is that you are not returning the object you are manipulating inside the function.

normalize<-function(col){
  col<-col/min(col)
  return(col)
}
catalandres
  • 1,149
  • 8
  • 20