13

I am trying to rename columns of multiple data.frames.

To give an example, let's say I've a list of data.frames dfA, dfB and dfC. I wrote a function changeNames to set names accordingly and then used lapply as follows:

dfs <- list(dfA, dfB, dfC)
ChangeNames <- function(x) {
    names(x) <- c("A", "B", "C" )  
}
lapply(dfs, ChangeNames)

However, this doesn't work as expected. It seems that I am not assigning the new names to the data.frame, rather only creating the new names. What am I doing wrong here?

Thank you in advance!

Arun
  • 116,683
  • 26
  • 284
  • 387
user2706593
  • 131
  • 1
  • 1
  • 4
  • After the line `names(x) <-` in your function, add `return(x)` or simply `x`. Else, you're returning just `names(x)`. – Arun Aug 22 '13 at 09:09
  • Thankyou for your reply Arun! If I ad return(x), I will get an outprint of dfA, dfB and dfC, with the new names. But if I view names(dfA), names(dfB) and names(dfC) afterwards, they still have the old column names. My data frames are also very large, so I am not interested in viewing them. Only changing there column names. – user2706593 Aug 22 '13 at 09:14
  • `lapply` does not modify the input. There's no "change by reference" happening here. Everything is being done on a copy. You'll have to assign the result back. do: `dfs <- lapply(dfs, ChangeNames)` – Arun Aug 22 '13 at 09:15
  • Ok, dfs is now one big list containing dfA, dfB and dfC, with the new columnnames. I am still interested in working with dfA, dfb an dfC individually, and individually they still have the old columnnames? How do I assign the result back to the individual dataframes? – user2706593 Aug 22 '13 at 09:22
  • well, you should assign them back. `dfA <- dfs[[1]]`... ? – Arun Aug 22 '13 at 09:24

3 Answers3

15

There are two things here:

  • 1) You should return the value you want from your function. Else, the last value will be returned. In your case, that's names(x). So, instead you should add as the final line, return(x) or simply x. So, your function would look like:

    ChangeNames <- function(x) {
        names(x) <- c("A", "B", "C" )
        return(x)
    }
    
  • 2) lapply does not modify your input objects by reference. It works on a copy. So, you'll have to assign the results back. Or another alternative is to use for-loops instead of lapply:

    # option 1
    dfs <- lapply(dfs, ChangeNames)
    
    # option 2
    for (i in seq_along(dfs)) {
        names(dfs[[i]]) <- c("A", "B", "C")
    }
    

Even using the for-loop, you'll still make a copy (because names(.) <- . does). You can verify this by using tracemem.

df <- data.frame(x=1:5, y=6:10, z=11:15)
tracemem(df)
# [1] "<0x7f98ec24a480>"
names(df) <- c("A", "B", "C")
tracemem(df)
# [1] "<0x7f98e7f9e318>"

If you want to modify by reference, you can use data.table package's setnames function:

df <- data.frame(x=1:5, y=6:10, z=11:15)
require(data.table)
tracemem(df)
# [1] "<0x7f98ec76d7b0>"
setnames(df, c("A", "B", "C"))
tracemem(df)
# [1] "<0x7f98ec76d7b0>"

You see that the memory location df is mapped to hasn't changed. The names have been modified by reference.

Arun
  • 116,683
  • 26
  • 284
  • 387
  • Using this as a function to change column names across multiple data frames contained in a list like this was incredibly helpful. I generalized the function to take a second and third argument, and used that as the input for a `grep()` to change names of specific columns within all of my data frames. – ano Aug 12 '15 at 15:41
12

If the dataframes were not in a list but just in the global environment, you could refer to them using a vector of string names.

dfs <- c("dfA", "dfB", "dfC")

for(df in dfs) {
  df.tmp <- get(df)
  names(df.tmp) <- c("A", "B", "C" ) 
  assign(df, df.tmp)
}

EDIT

To simplify the above code you could use

for(df in dfs)
  assign(df, setNames(get(df),  c("A", "B", "C")))

or using data.table which doesn't require reassigning.

for(df in c("dfA", "dfB"))
  data.table::setnames(get(df),  c("G", "H"))
JWilliman
  • 3,558
  • 32
  • 36
-1

I had the problem of importing a public data set and having to rename each dataframe and rename each column in each dataframe to trim whitespaces, lowercase, and replace internal spaces with periods.

Combining the above methods got me:

for (eachdf in dfs)
  df.tmp <- get(eachdf) 
    for (eachcol in 1:length(df.tmp))
      colnames(df.tmp)[eachcol] <-
      str_trim(str_to_lower(str_replace_all(colnames(df.tmp)[eachcol], " ", ".")))
      }
  assign(eachdf, df.tmp) 
}
TDog
  • 165
  • 1
  • 2
  • 9