64

(Somewhat related question: Enter new column names as string in dplyr's rename function)

In the middle of a dplyr chain (%>%), I would like to replace multiple column names with functions of their old names (using tolower or gsub, etc.)

library(tidyr); library(dplyr)
data(iris)
# This is what I want to do, but I'd like to use dplyr syntax
names(iris) <- tolower( gsub("\\.", "_", names(iris) ) )
glimpse(iris, 60)
# Observations: 150
# Variables:
#   $ sepal_length (dbl) 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6,...
#   $ sepal_width  (dbl) 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4,...
#   $ petal_length (dbl) 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4,...
#   $ petal_width  (dbl) 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3,...
#   $ species      (fctr) setosa, setosa, setosa, setosa, s...

# the rest of the chain:
iris %>% gather(measurement, value, -species) %>%
  group_by(species,measurement) %>%
  summarise(avg_value = mean(value)) 

I see ?rename takes the argument replace as a named character vector, with new names as values, and old names as names.

So I tried:

iris %>% rename(replace=c(names(iris)=tolower( gsub("\\.", "_", names(iris) ) )  ))

but this (a) returns Error: unexpected '=' in iris %>% ... and (b) requires referencing by name the data frame from the previous operation in the chain, which in my real use case I couldn't do.

iris %>% 
  rename(replace=c(    )) %>% # ideally the fix would go here
  gather(measurement, value, -species) %>%
  group_by(species,measurement) %>%
  summarise(avg_value = mean(value)) # I realize I could mutate down here 
                                     #  instead, once the column names turn into values, 
                                     #  but that's not the point
# ---- Desired output looks like: -------
# Source: local data frame [12 x 3]
# Groups: species
# 
#       species  measurement avg_value
# 1      setosa sepal_length     5.006
# 2      setosa  sepal_width     3.428
# 3      setosa petal_length     1.462
# 4      setosa  petal_width     0.246
# 5  versicolor sepal_length     5.936
# 6  versicolor  sepal_width     2.770
# ... etc ....  
Community
  • 1
  • 1
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
  • 10
    The elegant approach is: `iris %>% \`names<-\`(.,tolower( gsub("\\.", "_", names(.) ) ))` (I'm only joking.) – Frank May 21 '15 at 19:51
  • Some functions used in the answers below have been deprecated. `rename_with` is the latest dplyr verb to programmatically rename variables with a function. See answer below. – Paul Rougieux Mar 17 '21 at 08:45

8 Answers8

57

This is a very late answer, on May 2017

As of dplyr 0.5.0.9004, soon to be 0.6.0, many new ways of renaming columns, compliant with the maggritr pipe operator %>%, have been added to the package.

Those functions are:

  • rename_all
  • rename_if
  • rename_at

There are many different ways of using those functions, but the one relevant to your problem, using the stringr package is the following:

df <- df %>%
  rename_all(
      funs(
        stringr::str_to_lower(.) %>%
        stringr::str_replace_all(., '\\.', '_')
      )
  )

And so, carry on with the plumbing :) (no pun intended).

Guilherme Marthe
  • 1,104
  • 9
  • 18
  • 16
    Good to know, thanks. Also worth noting, you can do `df %<>% foo()` as shorthand for `df <- df %>% foo()` – C8H10N4O2 May 03 '17 at 14:02
  • 2
    Due to the new dplyr update where they changed how `funs()` works (really wish they hadn't), you need to substitute `list` for `funs` and place a tilde ~ before the function e.g. `list(~str_replace(., to_replace, replacement))` – MokeEire Jul 10 '19 at 23:13
38

I think you're looking at the documentation for plyr::rename, not dplyr::rename. You would do something like this with dplyr::rename:

iris %>% rename_(.dots=setNames(names(.), tolower(gsub("\\.", "_", names(.)))))
Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
  • 2
    You can put `.` in place of `iris` in its latter appearances. – Frank May 21 '15 at 19:50
  • This is very useful, why you had to use `rename_` instead of `rename`? – Konrad Jan 13 '16 at 17:25
  • Habit, since I mostly use dplyr programmatically – Matthew Plourde Jan 13 '16 at 17:27
  • 1
    @Konrad Actually, I don't have the doc in front of me, but I think the nonsafe version doesn't have the .dots argument – Matthew Plourde Jan 13 '16 at 17:28
  • @MatthewPlourde thanks very much for the useful comment. – Konrad Jan 13 '16 at 17:30
  • 2
    FYI: `rename_` is *slowly* being [deprecated](https://github.com/hadley/dplyr/blob/b89a5ca40a038bc620c98638a4e12c353e5a4528/R/manip.r#L413). I haven't found an obvious replacement, though @Frank's use of `setNames` seems the most direct (if not provided by `dplyr`). – r2evans Apr 02 '17 at 16:13
30

Here's a way around the somewhat awkward rename syntax:

myris <- iris %>% setNames(tolower(gsub("\\.","_",names(.))))
Frank
  • 66,179
  • 8
  • 96
  • 180
  • Another dependency for a workaround? This is getting more esoteric. – Anton May 21 '15 at 20:01
  • You can replace `setnames` with `setNames` and drop the call to `data.table`. – Matthew Plourde May 21 '15 at 20:01
  • @MatthewPlourde Do you know of a reason to prefer the longer `rename` over the simpler route? Your answer looks like `rename_(.dots=this_answer)`, right? The help page for `rename` does not advertise modification by reference as `setnames` from `data.table` does. – Frank May 21 '15 at 20:07
  • @Anton A fair point, but that's the nature of workarounds. (Thanks to Mathhew's comment, the dependency is gone again.) I feel like the dplyr syntax should be extended to support the OP's expectations (based on plyr), like `rename(replace_all=...)`. Seems deficient if constructing a named list and knowing to pass it to weird argument `.dots` is required here. – Frank May 21 '15 at 20:14
  • @Frank nope, this is what I would do. Hopefully my answer clarifies for OP how to use `rename` properly. I think you can simplify this further by dropping the first dot. – Matthew Plourde May 21 '15 at 20:20
  • 2
    @Frank I wound up using your answer (+1) because it is a simpler way to do what I wanted -- and taught me about setNames-- but @MatthewPlourde more literally answered the question as written (i.e. using `rename`). Thanks for your time! – C8H10N4O2 May 21 '15 at 20:38
  • @C8H10N4O2 Yeah, I think we're all on the same page :) I learned from his answer (and his comments on my answer) as well. – Frank May 21 '15 at 20:50
  • hm, I can't find `setNames` in `dplyr`. Somehow works on `iris`, but not on my dataset. Throws `unused argument (gsub("1/", "", names(.)))` instead. – JelenaČuklina Jan 25 '16 at 16:57
  • @Jelena-bioinf Every vanilla copy of R includes `setNames` as a function in the base stats package. Try `?setNames`. I can't tell where the problem is based on that error message on its own. – Frank Jan 25 '16 at 18:08
24

As of 2020, rename_if, rename_at and rename_all are marked superseded. The up-to-date way to tackle this the dplyr way would be rename_with():

iris %>% rename_with(tolower)

or a more complex version:

iris %>% 
  rename_with(stringr::str_replace, 
              pattern = "Length", replacement = "len", 
              matches("Length"))

(edit 2021-09-08)
As mentioned in a comment by @a_leemo, this notation is not mentioned in the manual verbatim. Rather, one would deduce the following from the manual:

iris %>% 
  rename_with(~ stringr::str_replace(.x, 
                                     pattern = "Length", 
                                     replacement = "len"), 
              matches("Length")) 

Both do the same thing, yet, I find the first solution a bit more readable. In the first example pattern = ... and replacement = ... are forwarded to the function as part of the ... dots implementation. For more details see ?rename_with and ?dots.

loki
  • 9,816
  • 7
  • 56
  • 82
  • 1
    Thank you! I was struggling to figure out how to code this using rename_with and this did the trick. – Erik Ruzek Mar 02 '21 at 22:11
  • how would one do this for a custom function @loki ? If I write the function in the rename_with statement it works to hand over the names automagically, if I define the function elsewhere, it doesn't `argument is not an atomic vector` – TobiO Jul 07 '21 at 08:36
  • 2
    just found out: simply do not give any argument to the function but specify it as a function `mydataframe %>% rename_with(myawesomefunction)` – TobiO Jul 07 '21 at 08:49
  • This solved a problem I was having, thanks! But why are the arguments inside the `str_replace()` function pulled outside of it? I couldn't figure this syntax out from the help documentation. – a_leemo Sep 07 '21 at 00:57
  • @a_leemo the version more akin to the manual would be: `iris %>% rename_with(~ stringr::str_replace(.x, pattern = "Length", replacement = "len"), matches("Length"))` with the `~ ` and `.x` notation. However, I find this rather complicated. But nevertheless, as you rightfully pointed my proposed solution was deviating from the manual. Thanks for this critique. I'll edit my answer accordingly. – loki Sep 08 '21 at 07:47
  • Hmm, thanks @loki. I’m still getting my head around NSE in the Tidyverse. I think I tried a few approaches with ~ or ., but I don’t think I used both! Cheers for the help. – a_leemo Sep 09 '21 at 13:26
  • this is definetely the right way to use str_replace inside rename_with. Thanks for saving me a bunch of time. – Santiago Sotelo Jun 07 '22 at 16:43
  • is it possible to rename_with str_glue too? I'm trying to use something like (in a for-loop): ```mutate(str_glue("My{str_sub(data[i], 23, - 5)}_var_G1") = var)``` – Larissa Cury Oct 03 '22 at 16:24
  • 1
    @LarissaCury you probably want to use `mutate` or `rename` with `!!`. Have a look at the examples with `?rlang::\`topic-inject\``. – loki Oct 04 '22 at 11:36
9

My eloquent attempt using base, stringr and dplyr:

EDIT: library(tidyverse) now includes all three libraries.

library(tidyverse)
library(maggritr) # Though in tidyverse to use %>% pipe you need to call it 
# library(dplyr)
# library(stringr)
# library(maggritr)

names(iris) %<>% # pipes so that changes are apply the changes back
    tolower() %>%
    str_replace_all(".", "_")

I do this for building functions with piping.

my_read_fun <- function(x) {
    df <- read.csv(x) %>%
    names(df) %<>%
        tolower() %>%
        str_replace_all("_", ".")
    tempdf %<>%
        select(a, b, c, g)
}
mtelesha
  • 2,079
  • 18
  • 16
  • 1
    str_replace_all is not in either of those packages. Fyi, no need to include "edit" notations in the text of your answer; just make it the best answer possible. Folks can see the edit history if they want by clicking a link below the answer. – Frank Nov 11 '15 at 15:26
  • 1
    The period in the first `str_replace_all` function should be escaped `\\.` - otherwise everything is replaced with an underscore – sbha Mar 09 '18 at 03:02
9

For this particular [but fairly common] case, the function has already been written in the janitor package:

library(janitor)

iris %>% clean_names()

##   sepal_length sepal_width petal_length petal_width species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
## .          ...         ...          ...         ...     ...

so all together,

iris %>% 
    clean_names() %>%
    gather(measurement, value, -species) %>%
    group_by(species,measurement) %>%
    summarise(avg_value = mean(value))

## Source: local data frame [12 x 3]
## Groups: species [?]
## 
##       species  measurement avg_value
##        <fctr>        <chr>     <dbl>
## 1      setosa petal_length     1.462
## 2      setosa  petal_width     0.246
## 3      setosa sepal_length     5.006
## 4      setosa  sepal_width     3.428
## 5  versicolor petal_length     4.260
## 6  versicolor  petal_width     1.326
## 7  versicolor sepal_length     5.936
## 8  versicolor  sepal_width     2.770
## 9   virginica petal_length     5.552
## 10  virginica  petal_width     2.026
## 11  virginica sepal_length     6.588
## 12  virginica  sepal_width     2.974
alistaire
  • 42,459
  • 4
  • 77
  • 117
2

Both select() and select_all() can be used to rename columns.

If you wanted to rename only specific columns you can use select:

iris %>% 
  select(sepal_length = Sepal.Length, sepal_width = Sepal.Width, everything()) %>% 
  head(2)

  sepal_length sepal_width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa

rename does the same thing, just without having to include everything():

iris %>% 
  rename(sepal_length = Sepal.Length, sepal_width = Sepal.Width) %>% 
  head(2)

  sepal_length sepal_width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa

select_all() works on all columns and can take a function as an argument:

iris %>% 
  select_all(tolower)

iris %>% 
  select_all(~gsub("\\.", "_", .)) 

or combining the two:

iris %>% 
  select_all(~gsub("\\.", "_", tolower(.))) %>% 
  head(2)

  sepal_length sepal_width petal_length petal_width species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
sbha
  • 9,802
  • 2
  • 74
  • 62
  • this worked better and is much more straightforward than anything in the `rename` family... it's strange that it's easier to use a `select_all` with `~gsub` than `rename_at` or `rename_if` with some kind of predicate of variable declaration... it seems like that's what `rename_*` is for – dre Jun 09 '19 at 14:21
2

In case you don't want to write the regular expressions yourself, you could use

  • the snakecase-pkg which is very flexible,
  • janitor::make_clean_names() which has some nice defaults or
  • janitor::clean_names() which does the same as make_clean_names(), but works directly on data frames.

Invoking them inside of a pipeline should be straightforward.

library(magrittr)
library(snakecase)

iris %>% setNames(to_snake_case(names(.)))
iris %>% tibble::as_tibble(.name_repair = to_snake_case)
iris %>% purrr::set_names(to_snake_case)
iris %>% dplyr::rename_all(to_snake_case)
iris %>% janitor::clean_names()

Taz
  • 546
  • 5
  • 9