368

A categorical variable V1 in a data frame D1 can have values represented by the letters from A to Z. I want to create a subset D2, which excludes some values, say, B, N and T. Basically, I want a command which is the opposite of %in%

D2 = subset(D1, V1 %in% c("B", "N", "T"))
Henrik
  • 65,555
  • 14
  • 143
  • 159
user702432
  • 11,898
  • 21
  • 55
  • 70
  • 95
    not %in%? (`!(x %in% y)`). Life can be easy sometimes... – Joris Meys Apr 29 '11 at 12:17
  • possible duplicate of [How I can select rows from a dataframe that do not match?](http://stackoverflow.com/questions/5812478/how-i-can-select-rows-from-a-dataframe-that-do-not-match) – Chase Apr 29 '11 at 13:11

13 Answers13

478

You can use the ! operator to basically make any TRUE FALSE and every FALSE TRUE. so:

D2 = subset(D1, !(V1 %in% c('B','N','T')))

EDIT: You can also make an operator yourself:

'%!in%' <- function(x,y)!('%in%'(x,y))

c(1,3,11)%!in%1:10
[1] FALSE FALSE  TRUE
JulienD
  • 7,102
  • 9
  • 50
  • 84
Sacha Epskamp
  • 46,463
  • 20
  • 113
  • 131
  • 5
    The use of second option is illustrated in the help(match) page (where you would get to if you typed `?"%in%"` ) where the new operator is called `%w/o%`. – IRTFM Apr 29 '11 at 12:50
  • 38
    also, see `?Negate` e.g. `"%ni%" <- Negate("%in%")` – baptiste Jun 11 '11 at 06:09
  • 2
    Negate worked for me when used after defining the new operator, as suggested by baptiste, e.g. ``subset(df, variable %ni% c("A", "B"))`` , but not when used directly, e.g. ````subset(df, variable Negate("%in%") c("A", "B"))```` – PatrickT Oct 27 '15 at 09:20
  • 4
    @PatrickT that’s because only operators can be used as operators. and operators are either built-in or start and end with `%`. To create an operator, you need to assign a function with two operands to a name starting and ending with `%`. – flying sheep Mar 15 '19 at 16:41
  • 5
    We can also use `filter(!(V1%in% c('B','N','T')))`. – ah bon Jan 29 '21 at 02:20
  • Another option is to use `slice(-which(V1 %in% Vector))` if using a single vector (e.g., `Vector <- seq(from=1, to=10, by=2)`) instead of a matrix, dataframe, or tibble – JJGabe Jan 10 '22 at 15:33
  • To clarify my above comment, `slice()` is usually used to remove rows by row number instead of by column values – JJGabe Jan 10 '22 at 18:28
106

How about:

`%ni%` <- Negate(`%in%`)
c(1,3,11) %ni% 1:10
# [1] FALSE FALSE  TRUE
Spencer Castro
  • 1,345
  • 1
  • 9
  • 21
57

Here is a version using filter in dplyr that applies the same technique as the accepted answer by negating the logical with !:

D2 <- D1 %>% dplyr::filter(!V1 %in% c('B','N','T'))
user29609
  • 1,991
  • 18
  • 22
36

If you look at the code of %in%

 function (x, table) match(x, table, nomatch = 0L) > 0L

then you should be able to write your version of opposite. I use

`%not in%` <- function (x, table) is.na(match(x, table, nomatch=NA_integer_))

Another way is:

function (x, table) match(x, table, nomatch = 0L) == 0L
Marek
  • 49,472
  • 15
  • 99
  • 121
17

Using negate from purrr also does the trick quickly and neatly:

`%not_in%` <- purrr::negate(`%in%`)

Then usage is, for example,

c("cat", "dog") %not_in% c("dog", "mouse")
EllaK
  • 209
  • 2
  • 9
  • 4
    There’s also a built-in `Negate` that does the same. The only difference is that purrr calls `as_mapper` on the thing you pass, while `Negate` calls `match.fun`. https://www.rdocumentation.org/packages/purrr/versions/0.2.5/topics/as_mapper https://stat.ethz.ch/R-manual/R-devel/library/base/html/match.fun.html – flying sheep Mar 15 '19 at 16:44
9

purrr::compose() is another quick way to define this for later use, as in:

`%!in%` <- compose(`!`, `%in%`)
edavidaja
  • 698
  • 1
  • 6
  • 15
6

Another solution could be using setdiff

D1 = c("A",..., "Z") ; D0 = c("B","N","T")

D2 = setdiff(D1, D0)

D2 is your desired subset.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
user3373954
  • 89
  • 1
  • 2
  • Sometimes it can be useful but it doesn't produce the same results if the are repetitions. – skan Oct 12 '20 at 15:38
3

Hmisc has %nin% function, which should do this.

https://www.rdocumentation.org/packages/Hmisc/versions/4.4-0/topics/%25nin%25

Matt
  • 518
  • 2
  • 5
  • 19
2
library(roperators)

1 %ni% 2:10

If you frequently need to use custom infix operators, it is easier to just have them in a package rather than declaring the same exact functions over and over in each script or project.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Benbob
  • 328
  • 2
  • 4
  • While this may be a correct answer, it would be more useful with additional explanation of _why_ it works. Consider editing it to include further details, and if you feel it's better than the accepted answer which was posted nearly a decade ago. – Jeremy Caney May 07 '20 at 01:22
1

The package has it built in: %!in%.

MYaseen208
  • 22,666
  • 37
  • 165
  • 309
Marcio Rodrigues
  • 319
  • 1
  • 11
0

The help for %in%, help("%in%"), includes, in the Examples section, this definition of not in,

"%w/o%" <- function(x, y) x[!x %in% y] #-- x without y

Lets try it:

c(2,3,4) %w/o% c(2,8,9)
[1] 3 4

Alternatively

"%w/o%" <- function(x, y) !x %in% y #--  x without y
c(2,3,4) %w/o% c(2,8,9)
# [1] FALSE  TRUE  TRUE
Tony Ladson
  • 3,539
  • 1
  • 23
  • 30
0
require(TSDT)

c(1,3,11) %nin% 1:10
# [1] FALSE FALSE  TRUE

For more information, you can refer to: https://cran.r-project.org/web/packages/TSDT/TSDT.pdf

Vishal Sharma
  • 289
  • 2
  • 10
-2

In Frank Harrell's package of R utility functions, he has a %nin% (not in) which does exactly what the original question asked. No need for wheel reinvention.

Jim Hunter
  • 47
  • 1
  • 4