0

I want to add multiple empty columns to multiple dataframes. I know the code to do this for 1 dataframe is df[,namevector] <- NA (other question). Namevector is a vector which contains the names of the empty variables that should be added. I have a list of multiple dataframes so I thought the following code would do the trick.

a <- data.frame(x = 1:10, y = 21:30)
b <- data.frame(x = 1:10, y = 31:40)
c <- list(a,b)
namevector <- c("z","w")     

EmptyVariables <- function(df) {df[,namevector] <- NA}
sapply(X = c, FUN = EmptyVariables)

I don't get an error message, but these 2 lines of code also don't add the empty columns.

Community
  • 1
  • 1
1053Inator
  • 302
  • 1
  • 15
  • 1
    You didn't assign the results to a symbol in the global environment, so they only existed inside that `sapply` call and were then marked for garbage collection. The `sapply` function would NOT have changed the original dataframes, though. Welcome to functional programming. – IRTFM Jan 15 '15 at 18:18
  • There was an additional issue that I didn't recognize and that was that the returned value from the 'EmptyVariables' function was NA. It should have been defined as `<- function(df) {df[,namevector] <- NA; df}` – IRTFM Jan 15 '15 at 18:38

1 Answers1

1

In principle the solution is there in the comments from BondedDust, but maybe some additional explanations might help.

Why did your original code not work? There are two things to be said about this:

  • as BondedDust mentioned, the assignment inside the function EmptyVariables is done in the environment of the function. Thus, only a local copy of the data frame df is changed, but not the df that exists in the global environment. Calling EmtpyVariables(a) leaves a unchanged.
  • a function returns the output from its last line. Since the last line of EmptyVariables is an assignment, and since assignments don't return anything in R, also the function does not return anything. This is the reason that you simply get NA twice from your call to sapply. The solution to this has already been pointed out by BondedDust: the function body should be {df[,namevector] <- NA;df}. In this case, the changed data frame is returned as the result of the function.

Also a comment regarding sapply: This function tries to return a vector or matrix. But your list of data frames can not reasonably be simplified in this way and you should therefore use lapply.

Finally, this is the code that should do what you want:

EmptyVariables <- function(df) {df[,namevector] <- NA;df}
res <- lapply(X = c, FUN = EmptyVariables)

res will be a list containing two data frames. Thus, res[[1]] and res[[2]] will give you a and b with the empty columns added, respectively.

Stibu
  • 15,166
  • 6
  • 57
  • 71
  • Thanks a lot. I figured it out based on BondedDusts explanation, but it's nice to have a more elaborate explanation for maybe future reference. – 1053Inator Jan 16 '15 at 12:20