-4

consider this simple example:

data <- data_frame('data::col1' = c(1,2,3), 'data::col2' = c(1,2,3))
> data
# A tibble: 3 × 2
  `data::col1` `data::col2`
         <dbl>        <dbl>
1            1            1
2            2            2
3            3            3

This kind of dataframe is the output one would get by using Apache Pig. Here, I am able to load it using dplyr, but as you can see the names of the columns are cumbersome.

How can I use the tidyverse suite to get rid of the part before the ::? Also, assume I have many columns with the pattern data::mycol so an ideal solution needs not typing manually each affected column.

output expected:

# A tibble: 3 × 2
   col1  col2
  <dbl> <dbl>
1     1     1
2     2     2
3     3     3

Thanks!

ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
  • 1
    ```my_data %>% rename(col1 = `data::col1`, col2 = `data::col2`)```? – Abdou Dec 13 '16 at 20:22
  • nice trick thanks @Abdou but I dont want a manual solution. Too many columns in my dataframe unfortunately – ℕʘʘḆḽḘ Dec 13 '16 at 20:23
  • 17
    No need to reinvent the wheel, here: `colnames(data) <- gsub("^data::","",colnames(data))`. – joran Dec 13 '16 at 20:23
  • @joran thanks! but is it possible to do so using the `tidyverse`? – ℕʘʘḆḽḘ Dec 13 '16 at 20:24
  • 2
    You should use regular expressions. – Jon Dec 13 '16 at 20:24
  • 12
    @Noobie Probably, I guess. But I can't say that I personally have much interest shoehorning this problem into a specific set of packages. – joran Dec 13 '16 at 20:29
  • 2
    You can use `setNames` inline: `data %>% setNames(gsub('^data::', '', names(.)))` – alistaire Dec 13 '16 at 20:36
  • 2
    Probably because you insist on a tidyverse solutions, which in this case isn't any better. Anyway, an alternative: `library(stringr); str_replace(names(dat), 'data::', '')` – Jaap Dec 13 '16 at 20:36
  • @Noobie OK. But in that case you might have to mention in your question why using `tidyverse` is more desirable than any other methods. – acylam Dec 13 '16 at 20:39
  • its more desirable for my sanity. coming from python, it drives me crazy that I have to use 10 different packages to do some stuff on my dataframe :D – ℕʘʘḆḽḘ Dec 13 '16 at 20:40
  • 1
    10 packages to do stuff on your dataframe? Unless you doing some very exotic things, most dataframe manipulation can be done in base R afaik – Jaap Dec 13 '16 at 20:43
  • 1
    I think the answer is that you can't, or at least not easily. `dplyr` and `tidyr` are packages that make it easier to work with `data.frames.` They provide a user friendly framework to think about and modify them. Ultimately the column names are a `character` `vectors` so they are not really the focus of those packages. The various base R solutions given are the way to go. – John Paul Dec 13 '16 at 20:44
  • 7
    @Noobie you do realise that you are using "10 different packages" when loading tidyverse? It's just a meta package... – Gavin Simpson Dec 13 '16 at 20:44
  • 2
    ...the base R solutions would result in you using _fewer_ packages, if that's what you're after. – alistaire Dec 13 '16 at 20:44
  • 6
    *" it drives me crazy that I have to use 10 different packages to do some stuff on my dataframe"* -- `tidyverse` is literally an agglomeration of several packages. Joran's solution uses zero packages. What's the issue? – nrussell Dec 13 '16 at 20:44
  • 6
    @Noobie Well, part of learning R is to know when _not_ to use a package if it is possible to use much simpler solutions. Learning different ways to solve a problem is nice, but this is not one of those cases where you should ditch the base R approach. Maybe you will come to a case in the future where it is _necessary_ or _easier_ to use `tidyverse`. – acylam Dec 13 '16 at 20:51

1 Answers1

0
library(dplyr)
library(purrr)

data <- data.frame('data::col1' = c(1,2,3), 'data::col2' = c(1,2,3))
names(data) <- names(data) %>%
  gsub("data..", "", .) 
Lee88
  • 1,185
  • 3
  • 15
  • 27