2

I need to rename several columns which names have a string pattern. Let's use this dataframe as example.

library(tidyverse, tibble)

df = as.tibble(matrix(0, nrow = 3, ncol = 30))

colnames(df) = c("p1", "BNT2", "BNT3", "BNT4","BNT5","BNT6","BNT7","BNT8","BNT9","BNT10",
                 "BNT11","BNT12","BNT13","BNT14" ,"BNT15", "groupTime186", "groupTime187", "groupTime188", "groupTime189", "groupTime190", "groupTime191", 
                 "groupTime192", "groupTime193", "groupTime194", "groupTime195" ,"groupTime196", "groupTime197", 
                 "groupTime198", "groupTime199", "groupTime200")

# A tibble: 3 x 30
     p1  BNT2  BNT3  BNT4  BNT5  BNT6  BNT7  BNT8  BNT9 BNT10 BNT11 BNT12 BNT13 BNT14 BNT15 groupTime186 groupTime187 groupTime188
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>        <dbl>        <dbl>        <dbl>
1     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0            0            0            0
2     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0            0            0            0
3     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0            0            0            0
# ... with 12 more variables: groupTime189 <dbl>, groupTime190 <dbl>, groupTime191 <dbl>, groupTime192 <dbl>, groupTime193 <dbl>,
#   groupTime194 <dbl>, groupTime195 <dbl>, groupTime196 <dbl>, groupTime197 <dbl>, groupTime198 <dbl>, groupTime199 <dbl>,
#   groupTime200 <dbl>

Normally I would use gsub and set_names to capture the item number and to construct the new name. Like this:

df %>% 
  set_names(gsub("p([0-9]{1,2})|BNT([0-9]{1,2})", "BOS_\\1\\2_cod", names(.)))

With this I can re-use the correlative numbers from the original names. The problem is that, because of the software we use to export responses, time-columns usually have a numeration that does not start from 01, so I can't re-use the numeration. Instead, I have to select only the time-columns and use colnames and paste0 to construct the names, and then rejoin the time-columns. Like this:

colnames(df) = paste0("BOS_", sprintf("%02d", 1:15), "_time")

I don't believe this is a good way to approach this task because requires more steps and it is not embedded in the original piped code that renames the answer-columns.

My question is: How can I select the columns to be renamed and feed them with a vector that contains the new names? Or alternatively, can I use a sequence, like sprintf("%02d", 1:15), so gsub replace the first column with the first term of the sequence? Ideally, I want a solution that can be embedded in a piped code (dplyr).

UPDATE: The expected output is the same dataframe but named in this way:

 [1] "BOS_01_raw"  "BOS_02_raw"  "BOS_03_raw"  "BOS_04_raw"  "BOS_05_raw"  "BOS_06_raw"  "BOS_07_raw"  "BOS_08_raw"  "BOS_09_raw"  "BOS_10_raw" 
[11] "BOS_11_raw"  "BOS_12_raw"  "BOS_13_raw"  "BOS_14_raw"  "BOS_15_raw"  "BOS_01_time" "BOS_02_time" "BOS_03_time" "BOS_04_time" "BOS_05_time"
[21] "BOS_06_time" "BOS_07_time" "BOS_08_time" "BOS_09_time" "BOS_10_time" "BOS_11_time" "BOS_12_time" "BOS_13_time" "BOS_14_time" "BOS_15_time"

As I said before, I can rename the BNT items because they already are numerated, but the groupTime columns are the problem.

niklai
  • 376
  • 3
  • 16
  • Can you show the expected string – akrun Jun 18 '17 at 03:37
  • 1
    The best way is to develop a script to tidy your data so you don't have variables contained in your column names. It's pretty unintelligible at the moment, though, so I'm not wholly sure what that would ideally look like. – alistaire Jun 18 '17 at 03:40
  • 1
    This might be hepful: https://stackoverflow.com/questions/44452108/how-to-rename-multiple-columns-given-character-vectors-of-column-names-and-repla – mt1022 Jun 18 '17 at 04:27

1 Answers1

4

I managed to solve the problem thanks to @mt1022 comment. According to How to rename multiple columns given character vectors of column names and replacement in dplyr 0.6.0?:

First a vector with the new names have to be created.

names_boston =  c(paste0("BOS_", sprintf("%02d", 1:31), "_time"))

Then the columns can be selected using grep, and the new names can be feed to rename_at.

df %>%
rename_at(vars(grep("Time", names(.))), ~names_boston)

And to avoid creating new vectors you can actually feed the vector to the previous line of code:

df %>%
    rename_at(vars(grep("Time", names(.))), ~c(paste0("BOS_", sprintf("%02d", 1:31), "_time")))
Gorka
  • 3,555
  • 1
  • 31
  • 37
niklai
  • 376
  • 3
  • 16