2

I have a large dataset with questionnaire data at multiple time points (waves). The questionnaire was identical at each point, so variables are labeled by time in the form "w#variablename" (e.g., "w1age", "w2age", "w3age").

I split the larger file into data frames by each time point, so I would now like to remove the "w#" from the column name for each column.

Basically, I would like to use R to "find and replace" to delete any column with "w1".

I split the data as follows:

w1 = Data %>% select(matches("w1"))
w2 = Data %>% select(matches("w2"))
w3 = Data %>% select(matches("w3"))
w4 = Data %>% select(matches("w4"))

Now for each of these 4 data sets, I would like to remove the respective "w#" from column names.

Thank you!

Hunter
  • 43
  • 1
  • 5
  • I think you can use `rename_at` without creating multiple objects `Data %>% rename_at(vars(matches("^w\\d+")), ~ str_remove(., "^w\\d+"))` – akrun May 16 '19 at 14:57
  • Do you want to _rename_ `w#` columns, or do you want to completely remove them? – Tim Biegeleisen May 16 '19 at 14:59

2 Answers2

5

We should be able to use sub here:

names(Data) <- sub("^w\\d+", "", names(Data))

The regex pattern ^w\\d+ matches, at the start of each column name, w, followed by one or more digits. We then replace this with empty string, effectively removing this prefix from matching column names.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
5

An option with tidyverse would be rename_at. Specify only the column names that needs to be changes with matches and with str_remove remove the substring "w" followed by one or more digits

library(dplyr)
library(stringr)
Data %>% 
   rename_at(vars(matches("^w\\d+")), ~ str_remove(., "^w\\d+"))

NOTE: If the column names are already w1age, w2age ... w100age and when we remove the 'w' followed by digits, all the columns would have the same column name which is discouraged). So, probably, we may need to wrap with make.unique to make the column names unique

akrun
  • 874,273
  • 37
  • 540
  • 662
  • I think they are unique, since all columns should be in different data.frames, named accordingly to the wave they were collected in. I think the transformation should not be executed on `Data`, but rather on the resulting data.frames `w1`, `w2` and `w3`. – hannes101 May 16 '19 at 15:54