118

Someone should have asked this already, but I couldn't find an answer. Say I have:

x = data.frame(q=1,w=2,e=3, ...and many many columns...)  

what is the most elegant way to rename an arbitrary subset of columns, whose position I don't necessarily know, into some other arbitrary names?

e.g. Say I want to rename "q" and "e" into "A" and "B", what is the most elegant code to do this?

Obviously, I can do a loop:

oldnames = c("q","e")
newnames = c("A","B")
for(i in 1:2) names(x)[names(x) == oldnames[i]] = newnames[i]

But I wonder if there is a better way? Maybe using some of the packages? (plyr::rename etc.)

Jaap
  • 81,064
  • 34
  • 182
  • 193
qoheleth
  • 2,219
  • 3
  • 18
  • 23

21 Answers21

134

With dplyr you would do:

library(dplyr)

df = data.frame(q = 1, w = 2, e = 3)
    
df %>% rename(A = q, B = e)

#  A w B
#1 1 2 3

Or if you want to use vectors, as suggested by @Jelena-bioinf:

library(dplyr)

df = data.frame(q = 1, w = 2, e = 3)

oldnames = c("q","e")
newnames = c("A","B")

df %>% rename_at(vars(oldnames), ~ newnames)

#  A w B
#1 1 2 3

L. D. Nicolas May suggested a change given rename_at is being superseded by rename_with:

df %>% 
  rename_with(~ newnames[which(oldnames == .x)], .cols = oldnames)

#  A w B
#1 1 2 3
Gorka
  • 3,555
  • 1
  • 31
  • 37
  • 3
    the user asked about passing `old` and `new` names as vectors, I think – JelenaČuklina Mar 23 '18 at 12:41
  • 4
    Thanks @Jelena-bioinf. I amended the answer to include your suggestion. – Gorka Mar 24 '18 at 16:08
  • 1
    Could you please explain the meaning of the ~(tilde) and where ".x" comes from in the rename_with example? – petzi Oct 26 '20 at 12:27
  • 2
    `rename_with` can use either a function or a formula to rename all columns given as the `.cols` argument. For example `rename_with(iris, toupper, starts_with("Petal"))` is equivalent to `rename_with(iris, ~ toupper(.x), starts_with("Petal"))` . – Paul Rougieux Dec 17 '20 at 09:32
  • 1
    Unclear, terrible syntax, this solution is just terrible, suppose I have to rename a column called "2012 (%)" in "2012": trying to guess what your solution means in real life based on this example is just impossible. rename() is just terrible in general. – Matteo Bulgarelli Apr 03 '23 at 14:24
128

setnames from the data.tablepackage will work on data.frames or data.tables

library(data.table)
d <- data.frame(a=1:2,b=2:3,d=4:5)
setnames(d, old = c('a','d'), new = c('anew','dnew'))
d


 #   anew b dnew
 # 1    1 2    4
 # 2    2 3    5

Note that changes are made by reference, so no copying (even for data.frames!)

mnel
  • 113,303
  • 27
  • 265
  • 254
  • 2
    For late arrivals here - Also take a look at [Joel's answer](http://stackoverflow.com/a/36010381/4606130) below which covers checking for existing columns in case you have a list of name changes which may not all be present e.g. `old = c("a", "d", "e")` – micstr Nov 07 '16 at 13:29
  • 1
    I wonder, does this work if you only wish to rename a subset / some of the columns instead of all of them? So if I had a data frame of ten columns and wished to rename _id_firstname to firstname and _id_lastname to lastname but leave the remaining eight columns untouched, can I do this or do I have to list all columns? – Mus Jul 09 '18 at 12:43
  • @MusTheDataGuy you supply the subset of new and old names, and it will work. – mnel Jul 16 '18 at 13:20
  • 1
    @mnel I need to change the variable names of a subset as @Mus asked. However, the code above did not work for a subset of data. @Gorka's answer with ```rename_at()``` worked for changing variable names of a subset. – Mehmet Yildirim Jan 31 '20 at 18:06
  • 2
    @micstr `skip_absent=TRUE` :) – bers Oct 25 '21 at 08:41
44

Another solution for dataframes which are not too large is (building on @thelatemail answer):

x <- data.frame(q=1,w=2,e=3)

> x
  q w e
1 1 2 3

colnames(x) <- c("A","w","B")

> x
  A w B
1 1 2 3

Alternatively, you can also use:

names(x) <- c("C","w","D")

> x
  C w D
1 1 2 3

Furthermore, you can also rename a subset of the columnnames:

names(x)[2:3] <- c("E","F")

> x
  C E F
1 1 2 3
Jaap
  • 81,064
  • 34
  • 182
  • 193
33

Here is the most efficient way I have found to rename multiple columns using a combination of purrr::set_names() and a few stringr operations.

library(tidyverse)

# Make a tibble with bad names
data <- tibble(
    `Bad NameS 1` = letters[1:10],
    `bAd NameS 2` = rnorm(10)
)

data 
# A tibble: 10 x 2
   `Bad NameS 1` `bAd NameS 2`
   <chr>                 <dbl>
 1 a                    -0.840
 2 b                    -1.56 
 3 c                    -0.625
 4 d                     0.506
 5 e                    -1.52 
 6 f                    -0.212
 7 g                    -1.50 
 8 h                    -1.53 
 9 i                     0.420
 10 j                     0.957

# Use purrr::set_names() with annonymous function of stringr operations
data %>%
    set_names(~ str_to_lower(.) %>%
                  str_replace_all(" ", "_") %>%
                  str_replace_all("bad", "good"))

# A tibble: 10 x 2
   good_names_1 good_names_2
   <chr>               <dbl>
 1 a                  -0.840
 2 b                  -1.56 
 3 c                  -0.625
 4 d                   0.506
 5 e                  -1.52 
 6 f                  -0.212
 7 g                  -1.50 
 8 h                  -1.53 
 9 i                   0.420
10 j                   0.957
Matt Dancho
  • 6,840
  • 3
  • 35
  • 26
  • 6
    This should be the answer, but could you should also probably expand on what the `~` and `.` arguments in the `set_names()` pipe do. – DaveRGP May 24 '18 at 13:41
  • 1
    In some cases, you need to explicitly type `purrr::set_names()`. – Levi Baguley Feb 11 '20 at 20:56
  • 2
    @DaveRGP when using `purrr` functions, the tilde `~` means "for each column". The `.` is dplyr syntax for LHS = left hand side of the pipe, i.e the reference to the object which is piped, in this case `data`. – Agile Bean May 19 '20 at 07:52
  • The tilde `~` is a formula. You can also use a function call and pass the arguments to the `...` argument of `set_names` for example `rlang::set_names(head(iris), paste0, "_hi")` is equivalent to `rlang::set_names(head(iris), ~ paste0(.x, "_hi"))`. – Paul Rougieux Dec 17 '20 at 09:54
  • `purrr::set_names()` got me today. thanks Levi! – taiyodayo Feb 25 '22 at 00:11
28

Update dplyr 1.0.0

The newest dplyr version became more flexible by adding rename_with() where _with refers to a function as input. The trick is to reformulate the character vector newnames into a formula (by ~), so it would be equivalent to function(x) return (newnames).

In my subjective opinion, that is the most elegant dplyr expression. Update: thanks to @desval, the oldnames vector must be wrapped by all_of to include all its elements:

# shortest & most elegant expression
df %>% rename_with(~ newnames, all_of(oldnames))

A w B
1 1 2 3

Side note:

If you reverse the order, either argument .fn must be specified as .fn is expected before .cols argument:

df %>% rename_with(oldnames, .fn = ~ newnames)

A w B
1 1 2 3

or specify argument .col:

 df %>% rename_with(.col = oldnames, ~ newnames)

A w B
1 1 2 3
Agile Bean
  • 6,437
  • 1
  • 45
  • 53
  • 3
    it looks like this answer returns a warning at present, and will return an error in the future, because of the ambiguity when using an external vector inside select https://tidyselect.r-lib.org/reference/faq-external-vector.html. This should fix it ```df %>% rename_with(~ newnames, all_of(oldnames))``` – desval Jan 24 '22 at 13:50
  • Could you provide a concrete example? I can't get any replacement for `newnames` or `oldnames` to work. – FLonLon Jun 23 '22 at 14:27
14

So I recently ran into this myself, if you're not sure if the columns exist and only want to rename those that do:

existing <- match(oldNames,names(x))
names(x)[na.omit(existing)] <- newNames[which(!is.na(existing))]
JoelKuiper
  • 4,362
  • 2
  • 22
  • 33
10

Building on @user3114046's answer:

x <- data.frame(q=1,w=2,e=3)
x
#  q w e
#1 1 2 3

names(x)[match(oldnames, names(x))] <- newnames

x
#  A w B
#1 1 2 3

This won't be reliant on a specific ordering of columns in the x dataset.

thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • 1
    I have upvoted your answer, but I still wonder if there is an even more elegant way to do this, particularly methods that rename by name, instead of by position – qoheleth Jan 08 '14 at 05:34
  • @qoheleth - it is renaming by name! There is no input here that is a positional vector as `match` takes care of that. The best you're going to do is probably @mnel's `setnames` answer. – thelatemail Jan 08 '14 at 05:47
  • 1
    it is still sort of renaming by position because, as you said, even though I don't have to explicitly specify a position vector, `match` is still a position oriented command. In this spirit, I deemed @user3114046's answer position based as well (even thought the `%in%` command takes care (or tries to) of things). Of course, I suppose you can argue all commands are position oriented when we drill down to the low level mechanism.... but that's not what I mean... the data.table answer is great because there is no multiple calling of the `name` commands. – qoheleth Jan 08 '14 at 05:56
7

You can use a named vector. Below two options (with base R and dplyr).

base R, via subsetting:

x = data.frame(q = 1, w = 2, e = 3) 

rename_vec <- c(q = "A", e = "B")
## vector of same length as names(x) which returns NA if there is no match to names(x)
which_rename <- rename_vec[names(x)]
## simple ifelse where names(x) will be renamed for every non-NA 
names(x) <- ifelse(is.na(which_rename), names(x), which_rename)

x
#>   A w B
#> 1 1 2 3

Or a dplyr option with !!!:

library(dplyr)

rename_vec <- c(A = "q", B = "e") # the names are just the other way round than in the base R way!

x %>% rename(!!!rename_vec)
#>   A w B
#> 1 1 2 3

The latter works because the 'big-bang' operator !!! is forcing evaluation of a list or a vector.

?`!!`

!!! forces-splice a list of objects. The elements of the list are spliced in place, meaning that they each become one single argument.

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • 1
    don't understand how this works - `!!!oldnames` returns `c("A", "B")` but which logic transforms this into `c("A", "w", "B")`?? – Agile Bean May 19 '20 at 07:34
  • 1
    @AgileBean I don't know where you found that !!!oldnames would return a vector. It is used to force non-standard evaluation of multiple arguments in dplyr. see `?\`!!\`` `Use \`!!!\` to add multiple arguments to a function. Its argument should evaluate to a list or vector: args <- list(1:3, na.rm = TRUE) ; quo(mean(!!!args))`. I think I'll add this explanation to the answer. Cheers for bringing it up – tjebo May 19 '20 at 09:09
5
names(x)[names(x) %in% c("q","e")]<-c("A","B")
James King
  • 6,229
  • 3
  • 25
  • 40
  • 4
    Not quite, because as I said, I don't necessarily know the position of the columns, your solution only works if `oldnames` is sorted so that `oldnames[i]` occurs before `oldnames[j]` for i – qoheleth Jan 08 '14 at 05:24
5

There are a few answers mentioning the functions dplyr::rename_with and rlang::set_names already. By they are separate. this answer illustrates the differences between the two and the use of functions and formulas to rename columns.

rename_with from the dplyr package can use either a function or a formula to rename a selection of columns given as the .cols argument. For example passing the function name toupper:

library(dplyr)
rename_with(head(iris), toupper, starts_with("Petal"))

Is equivalent to passing the formula ~ toupper(.x):

rename_with(head(iris), ~ toupper(.x), starts_with("Petal"))

When renaming all columns, you can also use set_names from the rlang package. To make a different example, let's use paste0 as a renaming function. pasteO takes 2 arguments, as a result there are different ways to pass the second argument depending on whether we use a function or a formula.

rlang::set_names(head(iris), paste0, "_hi")
rlang::set_names(head(iris), ~ paste0(.x, "_hi"))

The same can be achieved with rename_with by passing the data frame as first argument .data, the function as second argument .fn, all columns as third argument .cols=everything() and the function parameters as the fourth argument .... Alternatively you can place the second, third and fourth arguments in a formula given as the second argument.

rename_with(head(iris), paste0, everything(), "_hi")
rename_with(head(iris), ~ paste0(.x, "_hi"))

rename_with only works with data frames. set_names is more generic and can also perform vector renaming

rlang::set_names(1:4, c("a", "b", "c", "d"))
Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110
4

This would change all the occurrences of those letters in all names:

 names(x) <- gsub("q", "A", gsub("e", "B", names(x) ) )
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 2
    I don't think this is particularly elegant once you get past a couple of rename instances. – thelatemail Jan 08 '14 at 05:13
  • I'm just not good enough to whip up a `gsubfn` answer. Perhaps G.Grothendieck will come by. He is the regex-meister. – IRTFM Jan 08 '14 at 05:17
3

You can get the name set, save it as a list, and then do your bulk renaming on the string. A good example of this is when you are doing a long to wide transition on a dataset:

names(labWide)
      Lab1    Lab10    Lab11    Lab12    Lab13    Lab14    Lab15    Lab16
1 35.75366 22.79493 30.32075 34.25637 30.66477 32.04059 24.46663 22.53063

nameVec <- names(labWide)
nameVec <- gsub("Lab","LabLat",nameVec)

names(labWide) <- nameVec
"LabLat1"  "LabLat10" "LabLat11" "LabLat12" "LabLat13" "LabLat14""LabLat15"    "LabLat16" " 
Jaap
  • 81,064
  • 34
  • 182
  • 193
3

If the table contains two columns with the same name then the code goes like this,

rename(df,newname=oldname.x,newname=oldname.y)
Cyrus Mohammadian
  • 4,982
  • 6
  • 33
  • 62
varun
  • 31
  • 2
2

Sidenote, if you want to concatenate one string to all of the column names, you can just use this simple code.

colnames(df) <- paste("renamed_",colnames(df),sep="")
Corey Levinson
  • 1,553
  • 17
  • 25
1

Lot's of sort-of-answers, so I just wrote the function so you can copy/paste.

rename <- function(x, old_names, new_names) {
    stopifnot(length(old_names) == length(new_names))
    # pull out the names that are actually in x
    old_nms <- old_names[old_names %in% names(x)]
    new_nms <- new_names[old_names %in% names(x)]

    # call out the column names that don't exist
    not_nms <- setdiff(old_names, old_nms)
    if(length(not_nms) > 0) {
        msg <- paste(paste(not_nms, collapse = ", "), 
            "are not columns in the dataframe, so won't be renamed.")
        warning(msg)
    }

    # rename
    names(x)[names(x) %in% old_nms] <- new_nms
    x
}

 x = data.frame(q = 1, w = 2, e = 3)
 rename(x, c("q", "e"), c("Q", "E"))

   Q w E
 1 1 2 3
Dan
  • 493
  • 4
  • 7
0

If one row of the data contains the names you want to change all columns to you can do

names(data) <- data[row,]

Given data is your dataframe and row is the row number containing the new values.

Then you can remove the row containing the names with

data <- data[-row,]
CaffeineConnoisseur
  • 3,635
  • 4
  • 19
  • 16
0

This is the function that you need: Then just pass the x in a rename(X) and it will rename all values that appear and if it isn't in there it won't error

rename <-function(x){
  oldNames = c("a","b","c")
  newNames = c("d","e","f")
  existing <- match(oldNames,names(x))
  names(x)[na.omit(existing)] <- newNames[which(!is.na(existing))]
  return(x)
}
Zuti
  • 85
  • 8
  • 2
    this seems to be the same as [JoelKuiper's answer](https://stackoverflow.com/a/36010381/2204410), but then reframed as function ..... – Jaap Apr 09 '19 at 06:57
0

Many good answers above using specialized packages. This is a simple way of doing it only with base R.

df.rename.cols <- function(df, col2.list) {
  tlist <- transpose(col2.list)
    
  names(df)[which(names(df) %in% tlist[[1]])] <- tlist[[2]]

  df
} 

Here is an example:

df1 <- data.frame(A = c(1, 2), B = c(3, 4), C = c(5, 6), D = c(7, 8))
col.list <- list(c("A", "NewA"), c("C", "NewC"))
df.rename.cols(df1, col.list)

  NewA B NewC D
1    1 3    5 7
2    2 4    6 8
Soldalma
  • 4,636
  • 3
  • 25
  • 38
0

For execution time purposes , I would like to suggest to use data tables structure:

> df = data.table(x = 1:10, y = 3:12, z = 4:13)
> oldnames = c("x","y","z")
> newnames = c("X","Y","Z")
> library(microbenchmark)
> library(data.table)
> library(dplyr)
> microbenchmark(dplyr_1 = df %>% rename_at(vars(oldnames), ~ newnames) ,
+                dplyr_2 = df %>% rename(X=x,Y=y,Z=z) ,
+                data_tabl1= setnames(copy(df), old = c("x","y","z") , new = c("X","Y","Z")),
+                times = 100) 
Unit: microseconds
       expr    min      lq     mean  median      uq     max neval
    dplyr_1 5760.3 6523.00 7092.538 6864.35 7210.45 17935.9   100
    dplyr_2 2536.4 2788.40 3078.609 3010.65 3282.05  4689.8   100
 data_tabl1  170.0  218.45  368.261  243.85  274.40 12351.7   100
A. chahid
  • 184
  • 2
  • 5
0

I recently built off of @agile bean's answer (using rename_with, formerly rename_at) to build a function which changes column names if they exist in the data frame, such that one can make the column names of heterogeneous data frames match each other when applicable.

The looping can surely be improved, but figured I'd share for posterity.

create example data frame:
x= structure(list(observation_date = structure(c(18526L, 18784L, 
17601L), class = c("IDate", "Date")), year = c(2020L, 2021L, 
2018L)), sf_column = "geometry", agr = structure(c(id = NA_integer_, 
common_name = NA_integer_, scientific_name = NA_integer_, observation_count = NA_integer_, 
country = NA_integer_, country_code = NA_integer_, state = NA_integer_, 
state_code = NA_integer_, county = NA_integer_, county_code = NA_integer_, 
observation_date = NA_integer_, time_observations_started = NA_integer_, 
observer_id = NA_integer_, sampling_event_identifier = NA_integer_, 
protocol_type = NA_integer_, protocol_code = NA_integer_, duration_minutes = NA_integer_, 
effort_distance_km = NA_integer_, effort_area_ha = NA_integer_, 
number_observers = NA_integer_, all_species_reported = NA_integer_, 
group_identifier = NA_integer_, year = NA_integer_, checklist_id = NA_integer_, 
yday = NA_integer_), class = "factor", .Label = c("constant", 
"aggregate", "identity")), row.names = c("3", "3.1", "3.2"), class = "data.frame")
function
match_col_names <- function(x){

  col_names <- list(date = c("observation_date", "date"),
                    C =    c("observation_count", "count","routetotal"),
                    yday  = c("dayofyear"),
                    latitude  = c("lat"),
                    longitude = c("lon","long")
                    )

  for(i in seq_along(col_names)){
    newname=names(col_names)[i]
    oldnames=col_names[[i]]

  toreplace = names(x)[which(names(x) %in% oldnames)]
  x <- x %>%
    rename_with(~newname, toreplace)
}

return(x)

}

apply function
x <- match_col_names(x)
Jessica Burnett
  • 395
  • 1
  • 13
0

A base way using setNames making use that [] will take the first match.

names(x) <- setNames(c(newnames, names(x)), c(oldnames, names(x)))[names(x)]

names(x) <- (\(.) setNames(c(newnames, .), c(oldnames, .))[.])(names(x)) #Variant

x
#  A w B
#1 1 2 3

Using transform.

names(x) <- do.call(transform, c(list(as.list(setNames(names(x), names(x)))),
                                 as.list(setNames(newnames, oldnames))))

Data

x = data.frame(q=1,w=2,e=3)
oldnames = c("q","e")
newnames = c("A","B")
GKi
  • 37,245
  • 2
  • 26
  • 48