16

I want to find rows in a dataframe that do not match a pattern.

 Key = c(1,2,3,4,5)
 Code = c("X348","I605","B777","I609","F123")
 df1 <- data.frame(Key, Code)

I can find items beginning with I60 using:

 df2 <- subset (df1, grepl("^I60", df1$Code))

But I want to be able to find all the other rows (that is, those NOT beginning with I60). The invert argument does not work with grepl. grep on its own does not find all rows, nor can it pass the results to the subset command. Grateful for help.

Scott C Wilson
  • 19,102
  • 10
  • 61
  • 83
Stewart Wiseman
  • 675
  • 2
  • 7
  • 14
  • 2
    `subset(df1, !grepl("^I60", Code))` – akrun Jan 22 '15 at 10:10
  • 9
    Not sure what you mean that `grep` doesn't work. `df1[grep("^I60", df1$Code, invert = TRUE), ]` or `df1[-grep("^I60", df1$Code), ]` seems to work fine. I also never understood why would someone use `subset`. It is always reminds me this strange urge people have to use `plyr` for some reason. – David Arenburg Jan 22 '15 at 10:12
  • Fair point, just (bad) habit, but I'm new to R. Thanks for your comments, appreciated. – Stewart Wiseman Jan 22 '15 at 10:38
  • Upvote for use of 'invert' there - for some reason that flag had escaped me when using grep. Neat. Also works well when combining with pipe to exclude multiple objects. eg df1[grep("^I60|^F123", df1$Code, invert=TRUE, ] – Pascoe Aug 18 '20 at 14:16
  • which(!grepl()) gives you the inverse of grep() – Christopher Carroll Smith Jan 08 '23 at 01:19

2 Answers2

21

You could use the [ operator and do

df1[!grepl("I60", Code),]

(Suggested clarification from @Hugh:) Another way would be

df1[!grepl("I60",df1$Code),]

Here is the reference manual on array indexing, which is done with [:

http://cran.r-project.org/doc/manuals/R-intro.html#Array-indexing

Scott C Wilson
  • 19,102
  • 10
  • 61
  • 83
  • 1
    +1 though note that `,Code` is only valid because the variable was created independent of the data frame. (If `Code` wasn't an object in the environment, but was simply a column name, this code wouldn't work -- though only a tiny modification would be required.) – Hugh Jan 22 '15 at 11:54
  • Like @Hugh said, this wouldn't work for the real data set, not to mentions that both the answers here were already provided in the comments long ago. – David Arenburg Jan 22 '15 at 11:59
  • Yes, thanks for simplifying. I must use the [ operator more! – Stewart Wiseman Jan 22 '15 at 12:06
4

Also, you can try this:

 Key = c(1,2,3,4,5)
Code = c("X348","I605","B777","I609","F123")
df1 <- data.frame(Key, Code)
toRemove<-grep("^I60", df1$Code)
df2 <- df1[-toRemove,]
Fedorenko Kristina
  • 2,607
  • 2
  • 19
  • 18