How do I select variables in an R whose names contain a particular string?

Question

I have a large data set with thousands of columns. The column names include various unwanted characters as follows:

col1*
col2*
col3*[Note]

I would like to remove all character strings starting with * and with *[Note] from all column names to be left with clean:

col1 col2 col3 What is the most efficient way to do this for 5000+ columns?

It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. The example doesn't have to have 1000s of columns, just a few to get the point across. — MrFlick, Aug 22 '22 at 18:58

akrun · Accepted Answer · 2022-08-22T19:40:41.340

3

We could use sub from base R

names(df1) <- sub("\\*.*", "", names(df1))

edited Aug 22 '22 at 19:40

answered Aug 22 '22 at 18:56

akrun

score 1 · Answer 2 · answered Aug 22 '22 at 19:18

1

A dplyr solution

library(dplyr)
library(stringr)
df1 %>%
  rename_with(~str_remove(string = ., pattern = "\\*.*"), everything())

answered Aug 22 '22 at 19:18

Julian

2 Answers2