Remove columns that contain a specific word

Question

I have a data set that has 313 columns, ~52000 rows of information. I need to remove each column that contains the word "PERMISSIONS". I've tried grep and dplyr but I can't seem to get it to work.

I've read the file in,

testSet <- read.csv("/Users/.../data.csv")

Other examples show how to remove columns by name but I don't know how to handle wildcards. Not quite sure where to go from here.

Do you mean remove columns where *the column name* includes `PERMISSIONS` or where *a string somewhere in the column data* includes `PERMISSIONS`? — Gregor Thomas, Jan 23 '17 at 20:32
the word "PERMISSIONS" is in the column names or in the rows within the columns (ie data)? — JustGettinStarted, Jan 23 '17 at 20:32

score 12 · Answer 1 · edited Jul 20 '17 at 15:55

12

If you want to just remove columns that are named PERMISSIONS then you can use the select function in the dplyr package.

df <- data.frame("PERMISSIONS" = c(1,2), "Col2" = c(1,4), "Col3" = c(1,2))

PERMISSIONS Col2 Col3
1    1    1
2    4    2

df_sub <- select(df, -contains("PERMISSIONS"))

Col2 Col3
1    1
4    2

edited Jul 20 '17 at 15:55

gung - Reinstate Monica

11,583
7
60
79

answered Jul 20 '17 at 15:51

Brit

169
1
2

I get this error `Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘select’ for signature ‘"data.frame"’` and can't determine why. Any advice? – Ben Nov 07 '18 at 03:14
If I need to exclude by multiple characters? – ah bon Nov 23 '21 at 08:18

score 6 · Answer 2 · answered Jan 23 '17 at 20:57

From what I could understand from the question, the OP has a data frame like this:

df <- read.table(text = '
           a b c d
           e f PERMISSIONS g
           h i j k
           PERMISSIONS l m n',
                 stringsAsFactors = F)

The goal is to remove every column that has any 'PERMISSIONS' entry. Assuming that there's no variability in 'PERMISSIONS', this code should work:

cols <- colSums(mapply('==', 'PERMISSIONS', df))
new.df <- df[,which(cols == 0)]

JustGettinStarted · Answer 3 · 2017-01-23T20:48:57.130

5

Try this,

New.testSet <- testSet[,!grepl("PERMISSIONS", colnames(testSet))]

EDIT: changed script as per comment.

edited Jan 23 '17 at 20:48

answered Jan 23 '17 at 20:35

JustGettinStarted

784
7
21

akrun · Answer 4 · 2017-01-23T20:51:15.633

3

We can use grepl with ! negate,

New.testSet <- testSet[!grepl("PERMISSIONS",row.names(testSet)),
                         !grepl("PERMISSIONS", colnames(testSet))]

edited Jan 23 '17 at 20:51

answered Jan 23 '17 at 20:43

akrun

874,273
37
540
662

1

He wants columns that have "PERMISSIONS" anywhere in the rows to be dropped as well. – Kristofersen Jan 23 '17 at 20:49
The OP has been asked twice to clarify this in the comments. As it is i feel its open to interpretation – JustGettinStarted Jan 24 '17 at 01:43

score 2 · Answer 5 · answered Jan 23 '17 at 21:22

It looks like these answers only do part of what you want. I think this is what you're looking for. There is probably a better way to write this though.

library(data.table)
df = data.frame("PERMISSIONS" = c(1,2), "Col2" = c("PERMISSIONS","A"), "Col3" = c(1,2))

  PERMISSIONS        Col2 Col3
1           1 PERMISSIONS    1
2           2           A    2

df = df[,!grepl("PERMISSIONS",colnames(df))]
setDT(df)
ind = df[, lapply(.SD, function(x) grepl("PERMISSIONS", x, perl=TRUE))] 
df[,which(colSums(ind) == 0), with = FALSE]

   Col3
1:    1
2:    2

Remove columns that contain a specific word

5 Answers5

Linked

Related