1

I need to rename multiple variables using a replacement dataframe. This replacement dataframe also includes regex. I would like to use a similar solution proposed here, .e.g

df %>% rename_with(~ newnames, all_of(oldnames))

MWE:

df <- mtcars[, 1:5]

# works without regex 
replace_df_1 <- tibble::tibble(
  old = df %>% colnames(),
  new = df %>% colnames() %>% toupper()
) 

df %>% rename_with(~ replace_df_1$new, all_of(replace_df_1$old))


# with regex

replace_df_2 <- tibble::tibble(
  old = c("^m", "cyl101|cyl", "disp", "hp", "drat"),
  new = df %>% colnames() %>% toupper()
)

  old        new  
  <chr>      <chr>
1 ^m         MPG  
2 cyl101|cyl CYL  
3 disp       DISP 
4 hp         HP   
5 drat       DRAT 

# does not work
df %>% rename_with(~ replace_df_2$new, all_of(replace_df_2$old))
df %>% rename_with(~ matches(replace_df_2$new), all_of(replace_df_2$old))

EDIT 1:

The solution of @Mael works in general, but there seems to be index issue, e.g. consider the following example

replace_df_2 <- tibble::tibble(
  old = c("xxxx", "cyl101|cyl", "yyy", "xxx", "yyy"),
  new = mtcars[,1:5] %>% colnames() %>% toupper()
)


mtcars[, 1:5] %>% 
  rename_with(~ replace_df_2$new, matches(replace_df_2$old))

Results in

 mpg   MPG  disp    hp  drat
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9 

meaning that the rename_with function correctly finds the column, but replaces it with the first item in the replacement column. How can we tell the function to take the respective row where a replacement has been found? So in this example (edit 1), I only want to substitute the second column with "CYL", the rest should be left untouched. The problem is that the function takes the first replacement (MPG) instead of the second (CYL).

Thank you for any hints!

Julian
  • 6,586
  • 2
  • 9
  • 33

3 Answers3

2

matches should be on the regex-y column:

df %>% 
  rename_with(~ replace_df_2$new, matches(replace_df_2$old))
                     MPG CYL  DISP  HP DRAT
Mazda RX4           21.0   6 160.0 110 3.90
Mazda RX4 Wag       21.0   6 160.0 110 3.90
Datsun 710          22.8   4 108.0  93 3.85
Hornet 4 Drive      21.4   6 258.0 110 3.08
Hornet Sportabout   18.7   8 360.0 175 3.15
Valiant             18.1   6 225.0 105 2.76
#...
Maël
  • 45,206
  • 3
  • 29
  • 67
  • 1
    such a stupid error from my side, thank you! I blame monday for this ;) – Julian Sep 12 '22 at 11:46
  • Hey Mael, I make an edit since I ran into an issue. Maybe you have time to take a look. Cheers! – Julian Sep 12 '22 at 13:53
  • @Andrew I tried to replicate my problem with the x and y's. The renaming function should work on different dataframes. So in this example (edit 1), I only want to substitute the second column with "CYL", the rest should be left untouched. The problem is that the function takes the first replacement (MPG) instead of the second (CYL). – Julian Sep 12 '22 at 14:08
0

If the task is simply to set all col names to upper-case, then this works:

sub("^(.+)$", "\\U\\1", colnames(df), perl = TRUE)
[1] "MPG"  "CYL"  "DISP" "HP"   "DRAT"

In dplyr:

df %>%
  rename_with( ~sub("^(.+)$", "\\U\\1", colnames(df), perl = TRUE))
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • Thank you for your answer Chris. The uppercasing was just for the example, the problem was concerning the regex inside the replacement column, but @Mael answer concerning the placement of `matches` has solved it. – Julian Sep 12 '22 at 11:48
0

I found a solution using the idea of non standard evaluation from this question and @Maël's answer.

Using map_lgl we create a logical vector that returns TRUE if the column in replace_df_2$old can be found inside the dataframe df. Then we pass this logical vector to replace_df_2$new to get the correct replacement.

df <- mtcars[, 1:5]
df %>% 
  rename_with(.fn = ~replace_df_2$new[map_lgl(replace_df_2$old,~ any(str_detect(., names(df))))], 
              .cols = matches(replace_df_2$old))

Result:

                     mpg CYL  disp  hp drat
Mazda RX4           21.0   6 160.0 110 3.90
Julian
  • 6,586
  • 2
  • 9
  • 33