dplyr: how to modify column names based on a pattern?

Question

consider this simple example:

data <- data_frame('data::col1' = c(1,2,3), 'data::col2' = c(1,2,3))
> data
# A tibble: 3 × 2
  `data::col1` `data::col2`
         <dbl>        <dbl>
1            1            1
2            2            2
3            3            3

This kind of dataframe is the output one would get by using Apache Pig. Here, I am able to load it using dplyr, but as you can see the names of the columns are cumbersome.

How can I use the tidyverse suite to get rid of the part before the ::? Also, assume I have many columns with the pattern data::mycol so an ideal solution needs not typing manually each affected column.

output expected:

# A tibble: 3 × 2
   col1  col2
  <dbl> <dbl>
1     1     1
2     2     2
3     3     3

Thanks!

```my_data %>% rename(col1 = `data::col1`, col2 = `data::col2`)```? — Abdou, Dec 13 '16 at 20:22
nice trick thanks @Abdou but I dont want a manual solution. Too many columns in my dataframe unfortunately — ℕʘʘḆḽḘ, Dec 13 '16 at 20:23
No need to reinvent the wheel, here: `colnames(data) <- gsub("^data::","",colnames(data))`. — joran, Dec 13 '16 at 20:23
@joran thanks! but is it possible to do so using the `tidyverse`? — ℕʘʘḆḽḘ, Dec 13 '16 at 20:24
@Noobie Probably, I guess. But I can't say that I personally have much interest shoehorning this problem into a specific set of packages. — joran, Dec 13 '16 at 20:29
You can use `setNames` inline: `data %>% setNames(gsub('^data::', '', names(.)))` — alistaire, Dec 13 '16 at 20:36
Probably because you insist on a tidyverse solutions, which in this case isn't any better. Anyway, an alternative: `library(stringr); str_replace(names(dat), 'data::', '')` — Jaap, Dec 13 '16 at 20:36
@Noobie OK. But in that case you might have to mention in your question why using `tidyverse` is more desirable than any other methods. — acylam, Dec 13 '16 at 20:39
its more desirable for my sanity. coming from python, it drives me crazy that I have to use 10 different packages to do some stuff on my dataframe :D — ℕʘʘḆḽḘ, Dec 13 '16 at 20:40
10 packages to do stuff on your dataframe? Unless you doing some very exotic things, most dataframe manipulation can be done in base R afaik — Jaap, Dec 13 '16 at 20:43
I think the answer is that you can't, or at least not easily. `dplyr` and `tidyr` are packages that make it easier to work with `data.frames.` They provide a user friendly framework to think about and modify them. Ultimately the column names are a `character` `vectors` so they are not really the focus of those packages. The various base R solutions given are the way to go. — John Paul, Dec 13 '16 at 20:44
@Noobie you do realise that you are using "10 different packages" when loading tidyverse? It's just a meta package... — Gavin Simpson, Dec 13 '16 at 20:44
...the base R solutions would result in you using _fewer_ packages, if that's what you're after. — alistaire, Dec 13 '16 at 20:44
*" it drives me crazy that I have to use 10 different packages to do some stuff on my dataframe"* -- `tidyverse` is literally an agglomeration of several packages. Joran's solution uses zero packages. What's the issue? — nrussell, Dec 13 '16 at 20:44
@Noobie Well, part of learning R is to know when _not_ to use a package if it is possible to use much simpler solutions. Learning different ways to solve a problem is nice, but this is not one of those cases where you should ditch the base R approach. Maybe you will come to a case in the future where it is _necessary_ or _easier_ to use `tidyverse`. — acylam, Dec 13 '16 at 20:51

score 0 · Answer 1 · answered Dec 13 '16 at 20:27

0

library(dplyr)
library(purrr)

data <- data.frame('data::col1' = c(1,2,3), 'data::col2' = c(1,2,3))
names(data) <- names(data) %>%
  gsub("data..", "", .)

answered Dec 13 '16 at 20:27

Lee88

1,185
3
15
27

1

Sorry, just saw your request for a tidyverse solution. – Lee88 Dec 13 '16 at 20:27
if you add a `tidyverse` solution. please keep also the `purr` one. its nice – ℕʘʘḆḽḘ Dec 13 '16 at 20:28
3

Isn't this the same as @joran 's in the comments but it uses the `%>%` instead of nesting the functions? What is here from `purrr`? – John Paul Dec 13 '16 at 20:32
@JohnPaul yes, he wrote that answer in while I was creating mine. Our answers are the same. – Lee88 Dec 13 '16 at 20:44
1

This doesn't actually use anything from `purrr`, does it? Just needs pipes from `dplyr` (or magrittr) – Spacedman Dec 13 '16 at 21:01
3

Wouldn't `data %>% setNames(names(data) %>% stringr::str_replace("data..","")) -> data` be more tidyvers-ish? – Ben Bolker Dec 14 '16 at 03:07

dplyr: how to modify column names based on a pattern?

1 Answers1