Opposite of %in%: exclude rows with values specified in a vector

Question

A categorical variable V1 in a data frame D1 can have values represented by the letters from A to Z. I want to create a subset D2, which excludes some values, say, B, N and T. Basically, I want a command which is the opposite of %in%

D2 = subset(D1, V1 %in% c("B", "N", "T"))

possible duplicate of [How I can select rows from a dataframe that do not match?](http://stackoverflow.com/questions/5812478/how-i-can-select-rows-from-a-dataframe-that-do-not-match) — Chase, Apr 29 '11 at 13:11

score 478 · Accepted Answer · edited Sep 27 '18 at 22:04

478

You can use the ! operator to basically make any TRUE FALSE and every FALSE TRUE. so:

D2 = subset(D1, !(V1 %in% c('B','N','T')))

EDIT: You can also make an operator yourself:

'%!in%' <- function(x,y)!('%in%'(x,y))

c(1,3,11)%!in%1:10
[1] FALSE FALSE  TRUE

edited Sep 27 '18 at 22:04

JulienD

7,102
9
50
84

answered Apr 29 '11 at 12:10

Sacha Epskamp

46,463
20
113
131

5

The use of second option is illustrated in the help(match) page (where you would get to if you typed `?"%in%"` ) where the new operator is called `%w/o%`. – IRTFM Apr 29 '11 at 12:50
38

also, see `?Negate` e.g. `"%ni%" <- Negate("%in%")` – baptiste Jun 11 '11 at 06:09
2

Negate worked for me when used after defining the new operator, as suggested by baptiste, e.g. ``subset(df, variable %ni% c("A", "B"))`` , but not when used directly, e.g. ````subset(df, variable Negate("%in%") c("A", "B"))```` – PatrickT Oct 27 '15 at 09:20
4

@PatrickT that’s because only operators can be used as operators. and operators are either built-in or start and end with `%`. To create an operator, you need to assign a function with two operands to a name starting and ending with `%`. – flying sheep Mar 15 '19 at 16:41
5

We can also use `filter(!(V1%in% c('B','N','T')))`. – ah bon Jan 29 '21 at 02:20
Another option is to use `slice(-which(V1 %in% Vector))` if using a single vector (e.g., `Vector <- seq(from=1, to=10, by=2)`) instead of a matrix, dataframe, or tibble – JJGabe Jan 10 '22 at 15:33
To clarify my above comment, `slice()` is usually used to remove rows by row number instead of by column values – JJGabe Jan 10 '22 at 18:28

Spencer Castro · Answer 2 · 2021-11-10T02:50:38.720

106

How about:

`%ni%` <- Negate(`%in%`)
c(1,3,11) %ni% 1:10
# [1] FALSE FALSE  TRUE

edited Nov 10 '21 at 02:50

answered Oct 21 '17 at 20:24

Spencer Castro

1,345
1
9
21

this one actually doesn't work as it throws an error something about `SPECIAL` `%ni` – Flash Thunder Apr 30 '21 at 13:34
Still works just fine. R version 4.0.3 (2020-10-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 10.16 – Spencer Castro Apr 30 '21 at 20:07
4

its becuase `'` is not `\``, and you should use the `\`` – Flash Thunder Apr 30 '21 at 22:36
The changes have been made. Thanks. – Spencer Castro Nov 10 '21 at 02:51

user29609 · Answer 3 · 2018-06-28T20:37:11.977

57

Here is a version using filter in dplyr that applies the same technique as the accepted answer by negating the logical with !:

D2 <- D1 %>% dplyr::filter(!V1 %in% c('B','N','T'))

edited Jun 28 '18 at 20:37

answered May 17 '18 at 00:34

user29609

1,991
18
22

score 36 · Answer 4 · answered Apr 29 '11 at 13:16

If you look at the code of %in%

 function (x, table) match(x, table, nomatch = 0L) > 0L

then you should be able to write your version of opposite. I use

`%not in%` <- function (x, table) is.na(match(x, table, nomatch=NA_integer_))

Another way is:

function (x, table) match(x, table, nomatch = 0L) == 0L

EllaK · Answer 5 · 2018-05-22T17:12:50.600

17

Using negate from purrr also does the trick quickly and neatly:

`%not_in%` <- purrr::negate(`%in%`)

Then usage is, for example,

c("cat", "dog") %not_in% c("dog", "mouse")

edited May 22 '18 at 17:12

answered May 21 '18 at 16:24

EllaK

209
2
9

4

There’s also a built-in `Negate` that does the same. The only difference is that purrr calls `as_mapper` on the thing you pass, while `Negate` calls `match.fun`. https://www.rdocumentation.org/packages/purrr/versions/0.2.5/topics/as_mapper https://stat.ethz.ch/R-manual/R-devel/library/base/html/match.fun.html – flying sheep Mar 15 '19 at 16:44

score 9 · Answer 6 · answered May 09 '18 at 14:08

9

purrr::compose() is another quick way to define this for later use, as in:

`%!in%` <- compose(`!`, `%in%`)

answered May 09 '18 at 14:08

edavidaja

698
1
6
15

score 6 · Answer 7 · edited Mar 01 '18 at 09:44

6

Another solution could be using setdiff

D1 = c("A",..., "Z") ; D0 = c("B","N","T")

D2 = setdiff(D1, D0)

D2 is your desired subset.

edited Mar 01 '18 at 09:44

David Arenburg

91,361
17
137
196

answered Sep 06 '17 at 17:35

user3373954

89
1
2

Sometimes it can be useful but it doesn't produce the same results if the are repetitions. – skan Oct 12 '20 at 15:38

score 3 · Answer 8 · answered May 28 '20 at 04:32

3

Hmisc has %nin% function, which should do this.

https://www.rdocumentation.org/packages/Hmisc/versions/4.4-0/topics/%25nin%25

answered May 28 '20 at 04:32

Matt

518
2
5
19

score 2 · Answer 9 · edited Jun 29 '20 at 19:17

2

library(roperators)

1 %ni% 2:10

If you frequently need to use custom infix operators, it is easier to just have them in a package rather than declaring the same exact functions over and over in each script or project.

edited Jun 29 '20 at 19:17

zx8754

52,746
12
114
209

answered May 07 '20 at 00:21

Benbob

328
2
4

While this may be a correct answer, it would be more useful with additional explanation of _why_ it works. Consider editing it to include further details, and if you feel it's better than the accepted answer which was posted nearly a decade ago. – Jeremy Caney May 07 '20 at 01:22

score 1 · Answer 10 · edited Jun 13 '22 at 23:16

1

The package collapse has it built in: %!in%.

edited Jun 13 '22 at 23:16

MYaseen208

22,666
37
165
309

answered Sep 30 '20 at 14:26

Marcio Rodrigues

319
1
11

Tony Ladson · Answer 11 · 2019-12-17T00:40:42.243

0

The help for %in%, help("%in%"), includes, in the Examples section, this definition of not in,

"%w/o%" <- function(x, y) x[!x %in% y] #-- x without y

Lets try it:

c(2,3,4) %w/o% c(2,8,9)
[1] 3 4

Alternatively

"%w/o%" <- function(x, y) !x %in% y #--  x without y
c(2,3,4) %w/o% c(2,8,9)
# [1] FALSE  TRUE  TRUE

edited Dec 17 '19 at 00:40

answered Mar 03 '19 at 10:59

Tony Ladson

3,539
1
23
30

score 0 · Answer 12 · answered Jun 19 '20 at 09:26

0

require(TSDT)

c(1,3,11) %nin% 1:10
# [1] FALSE FALSE  TRUE

For more information, you can refer to: https://cran.r-project.org/web/packages/TSDT/TSDT.pdf

answered Jun 19 '20 at 09:26

Vishal Sharma

289
2
10

score -2 · Answer 13 · answered Aug 18 '21 at 17:12

-2

In Frank Harrell's package of R utility functions, he has a %nin% (not in) which does exactly what the original question asked. No need for wheel reinvention.

answered Aug 18 '21 at 17:12

Jim Hunter

47
1
4

Opposite of %in%: exclude rows with values specified in a vector

13 Answers13

Linked

Related