How to drop columns by name pattern in R?

Question

I have this dataframe:

state county city  region  mmatrix  X1 X2 X3    A1     A2     A3      B1     B2     B3      C1      C2      C3

  1      1     1      1     111010   1  0  0     2     20    200       Push      8     12      NA      NA      NA
  1      2     1      1     111010   1  0  0     4     NA    400       Shove      9     NA

Now I want to exclude columns whose names end with a certain string, say "1" (i.e. A1 and B1). I wrote this code:

df_redacted <- df[, -grep("\\1$", colnames(df))]

However, this seems to delete every column. How can I modify the code so that it only deletes the columns that matches the pattern (i.e. ends with "3" or any other string)?

The solution has to be able to handle a dataframe with has both numerical and categorical values.

score 90 · Answer 1 · edited May 01 '21 at 15:05

90

I found a simple answer using dplyr/tidyverse. If your colnames contain "This", then all variables containing "This" will be dropped.

library(dplyr) 
df_new <- df %>% select(-contains("This"))

edited May 01 '21 at 15:05

NelsonGon

13,015
7
27
57

answered Apr 11 '18 at 06:04

Samuel Saari

1,023
8
13

2

If I need to drop columns contains `this` or `that`? I use `df %>% select(-contains("this|that"))`, it seems not working. – ah bon Nov 23 '21 at 08:24
1

You are close: ```df %>% select(-(contains("this") | contains("that")))``` – Samuel Saari Nov 24 '21 at 06:31
4

Or `df %>% select(-contains(c('this', 'that')))` works as well. – ah bon Nov 24 '21 at 06:34
Hi @SamuelSaari, how to remove columns which start with `this` or `that`? – Roy Jan 05 '23 at 18:18
You can try: ```select(-startsWith(-c("this", "that" ))```. It will delete only the columns that begin with the string 'this' or 'that'.. – Sandy Jan 06 '23 at 23:59

score 54 · Answer 2 · answered Mar 27 '13 at 18:17

Your code works like a charm if I apply it to a minimal example and just search for the string "A":

df <- data.frame(ID = 1:10,
                 A1 = rnorm(10),
                 A2 = rnorm(10),
                 B1 = letters[1:10],
                 B2 = letters[11:20])
df[, -grep("A", colnames(df))]

So your problem is more a regular expression problem, not how to drop columns. If I run your code, I get an error:

df[, -grep("\\3$", colnames(df))]
Error in grep("\\3$", colnames(df)) : 
  invalid regular expression '\3$', reason 'Invalid back reference'

Update: Why don't you just use this following expression?

df[, -grep("1$", colnames(df))]
   ID         A2 B2
1   1  2.0957940  k
2   2 -1.7177042  l
3   3 -0.0448357  m
4   4  1.2899925  n
5   5  0.7569659  o
6   6 -0.5048024  p
7   7  0.6929080  q
8   8 -0.5116399  r
9   9 -1.2621066  s
10 10  0.7664955  t

.@SimonO'Hanlon - I am using `grepl()` as `data <- data [, !grepl("Unique-",names(data))]` to remove column where column name starts with `Unique-`. I see that `grepl()` is appending duplicate column name with `.1`,`.2`,`.3` etc. Is it possible to use `grepl()` without affecting column name even if there are duplicates in the data frame? — Chetan Arvind Patil, Jan 22 '19 at 18:51
When running `df[, -grep("A", colnames(df))]`, if no match is found grep returns 0 and a data.frame with zero columns is returned. — mindlessgreen, Jul 26 '19 at 23:20

score 17 · Answer 3 · answered Mar 30 '17 at 09:12

17

Just as an additional answer, since I stumbled across this, when looking for the data.table solution to this problem.

library(data.table)
dt <- data.table(df)
drop.cols <- grep("1$", colnames(dt))
dt[, (drop.cols) := NULL]

answered Mar 30 '17 at 09:12

hannes101

2,410
1
17
40

score 7 · Answer 4 · edited Jul 16 '19 at 09:58

7

For excluding any string you can use...

 # Search string to exclude
 strng <- "1"
 df <- data.frame(matrix(runif(25,max=10),nrow=5))
 colnames(df) <- paste( "EX" , 1:5 )
 df_red <- df[, -( grep(paste0( strng , "$" ) , colnames(df),perl = TRUE) ) ]

    df
#         EX 1     EX 2        EX 3     EX 4     EX 5
#   1 7.332913 4.972780 1.175947853 6.428073 8.625763
#   2 2.730271 3.734072 6.031157537 1.305951 8.012606
#   3 9.450122 3.259247 2.856123205 5.067294 7.027795
#   4 9.682430 5.295177 0.002015966 9.322912 7.424568
#   5 1.225359 1.577659 4.013616377 5.092042 5.130887

    df_red
#         EX 2        EX 3     EX 4     EX 5
#   1 4.972780 1.175947853 6.428073 8.625763
#   2 3.734072 6.031157537 1.305951 8.012606
#   3 3.259247 2.856123205 5.067294 7.027795
#   4 5.295177 0.002015966 9.322912 7.424568
#   5 1.577659 4.013616377 5.092042 5.130887

edited Jul 16 '19 at 09:58

markus

25,843
5
39
58

answered Mar 27 '13 at 18:18

Simon O'Hanlon

58,647
14
142
184

1

Sorry for the inconsistency. The point is that I want to be able to specify any string that is the end of a colname, and then delete all columns that has that string at the end. So for this example data "1", "3", "ity", "ion", & "rix" would all be valid examples. – histelheim Mar 27 '13 at 18:23
Yes - as long as it works with other strings as well. For example, in one dataframe I need to delete every column that ends with the string "timestamps" – histelheim Mar 27 '13 at 18:35

score 3 · Answer 5 · answered Sep 21 '21 at 02:53

3

If you are specifically looking for a pattern that appears at the end of the column name, to drop those columns, you can use the following command:

library(dplyr) 
df_new <- df %>% select(-ends_with("linear"))

All the columns that end with the string linear will be dropped.

answered Sep 21 '21 at 02:53

Sandy

1,100
10
18

Hi @sandy, the columns start with `X` or `NA`? – Roy Jan 05 '23 at 18:12
1

Did you mean removing all columns with names beginning with ```X``` or ```NA```? You can try this : ```df %>% select(-(startsWith("X") | startsWith("NA")))``` – Sandy Jan 06 '23 at 23:57

score 0 · Answer 6 · answered Feb 28 '20 at 22:04

You can expand it further using regex for a broader pattern search. I have a data frame that has a bunch of columns with "name", "upper_name"and"lower_name"` as they represent confidence intervals for a bunch of series, but I don't need them all. So, using regex, you can do the following:

pattern = "(upper_[a-z]*)|(lower_[a-z]*)"
policyData <- policyData[, -grep(pattern = pattern, colnames(policyData))]

The "|" allows me to include an or statement in the regex so I can do it once with a single patter rather than look for each pattern.

score 0 · Answer 7 · answered Mar 01 '23 at 17:59

If the first solution is failing for you with the following error:

Error in select(., -contains("unknown")) : 
unused argument (-contains("unknown"))

Your R may be be trying to use 'select' from the MASS module. To fix this, use

new_df <- df %>% dplyr::select(-contains("unknown"))

(Would have said this in a comment, but not enough rep yet)

How to drop columns by name pattern in R?

7 Answers7

Linked

Related