9

I'm trying to write a function in R that drops columns from a data frame and returns the new data with a name specified as an argument of the function:

drop <- function(my.data,col,new.data) {
new.data <<- my.data[,-col] 
return(new.data)
}

So in the above example, I want a new data frame to exist after the function is called that is named whatever the user inputs as the third argument.

When I call the function the correct data frame is returned, but then if I then try to use the new data frame in the global environment I get object not found. I thought by using the <<- operator I was defining new.data globally.

Can someone help me understand what's going on and if there is a way to accomplish this?

I found this and this that seemed related, but neither quite answered my question.

Community
  • 1
  • 1
cerpintaxt
  • 256
  • 1
  • 2
  • 13
  • You could `assign(new.data, mydata[,-col], envir = .GlobalEnv)` although I would recommend against this whole idea – Jake Burkhead Mar 14 '14 at 18:12
  • It looks like your function requires more typing than explicitly doing the call directly. What is the point? Also assigning things using `<<-` from within a function is terrible practice. – Dason Mar 14 '14 at 18:13
  • 1
    You are trying to write a function with a side effect. R is a functional language and thus functions shouldn't have side effects. – Roland Mar 14 '14 at 18:16
  • @Dason ah good to know that <<- should'nt be used in a function - thanks. My actual function is longer than this I was just using this as an easy example. It does save a lot of typing. – cerpintaxt Mar 14 '14 at 18:18

2 Answers2

17

Use the assign() function.

  assign("new.data", my.data[,-col], envir = .GlobalEnv) 

The first argument should be a string. In this case, the resultant global variable will be named "new.data". If new.data is the name itself, drop the quotes from the function call.

<<- does not always assign to the global environment.

In general, however, it is better to return things from a function than set global variables from inside a function. The latter is a lot harder to debug.

Christopher Louden
  • 7,540
  • 2
  • 26
  • 29
0

One reason to need this is when working a great deal with the RStudio console to perform lots of text mining. For example, if you have a large corpus and you want to break it up into sub-corpi based on themes, performing the processing as a function and returning a cleaned corpus can be much faster. An example is below:

 processText <- function(inputText, corpName){
  outputName <- Corpus(VectorSource(inputText))
  outputName <- tm_map(outputName,PlainTextDocument)
  outputName <- tm_map(outputName, removeWords, stopwords("english"))
  outputName <- tm_map(outputName, removePunctuation)
  outputName <- tm_map(outputName, removeNumbers)
  outputName <- tm_map(outputName, stripWhitespace)
  assign(corpName, outputName, envir = .GlobalEnv)
  return(corpName)
}

In the case above, I enter the column from the data frame as the inputText and the desired output corpus as corpName. This allows the simple task of the following to process a bunch of text data:

processText(retail$Essay,"retailCorp")

Then the new corpus "retailCorp" shows up in the global environment for further work such as plotting word clouds, etc. Also, I can send lists through the function and get lots of corpi back.

Bryan Butler
  • 1,750
  • 1
  • 19
  • 19