17

Is there an existing function for determining whether a row exists within a data frame? I suppose could do an apply/identical, but it seems like I'm missing something.

For example:

given such a data frame:

  a   b
1 1 cat
2 2 dog

Is there an existing function which will allow me to test whether the row (1, cat) exists in the data frame?

Thanks, Zach

Zach
  • 2,445
  • 3
  • 20
  • 25

7 Answers7

25

Try match_df from plyr (using Marek's sample data):

library(plyr)
X <- data.frame(a=1:2, b=c("cat","dog"))
row_to_find <- data.frame(a=1, b="cat")

match_df(X, row_to_find)
hadley
  • 102,019
  • 32
  • 183
  • 245
8

For data from @Marek answer.

nrow(merge(row_to_find,X))>0 # TRUE if exists
Wojciech Sobala
  • 7,431
  • 2
  • 21
  • 27
  • For me, this is also the fastest! – agoldev Feb 06 '17 at 17:23
  • 2
    This can be dangerous. Make sure the row_to_find is a `data.frame`. Otherwise it will not look for matching the entire row. It works on the sample data by @Marek, as stated. But once you enter c(1,"cat") as the row_to_find it will return always true. – agoldev Jul 28 '17 at 17:40
7

Taking your example:

X <- data.frame(a=1:2, b=c("cat","dog"))
row_to_find <- data.frame(a=1, b="cat") # it has to be data.frame (not a vector) to hold different types

Then

duplicated(rbind(X, row_to_find))[nrow(X)+1]

gives you answer.

Marek
  • 49,472
  • 15
  • 99
  • 121
1

I suggest Ben Bolker's solution since nrow(merge(row_to_find,X))>0 solution doesn't work for me (always give TRUE) :

tail(duplicated(rbind(X,row_to_find)),1)>0
Community
  • 1
  • 1
1

For fans of dplyr and the tidyverse, you can use dplyr:anti_join(). According to its documentation, dplyr::anti_join(x, y) "returns all rows from x where there are not matching values in y, keeping just columns from x." Hence for dplyr::anti_join(row, df) the result has zero rows, then row was indeed in df, if it has one row, then row was not in df.

library(dplyr)

df <- tribble(~a, ~b,
              1,  "cat",
              2,  "dog")
#> # A tibble: 2 x 2
#>       a b    
#>   <dbl> <chr>
#> 1  1.00 cat  
#> 2  2.00 dog

row <- tibble(a = 1, b = "cat")
#> # A tibble: 1 x 2
#>       a b    
#>   <dbl> <chr>
#> 1  1.00 cat

nrow(anti_join(row, df)) == 0  # row is in df so should be TRUE
#> Joining, by = c("a", "b")
#> [1] TRUE

row <- tibble(a = 3, b = "horse")
#> # A tibble: 1 x 2
#>       a b    
#>   <dbl> <chr>
#> 1  3.00 horse

nrow(anti_join(row, df)) == 0  # row is not in df so should be FALSE
#> Joining, by = c("a", "b")
#> [1] FALSE
Rory Nolan
  • 972
  • 10
  • 15
0

For vector, y, with same number of elements as columns in dataframe, dfrm:

apply(dfrm, 1, function(x) all( x == y) )

Should return a vector of TRUE and FALSE which could in turn be used as an index in [,]

dfrm[ apply(dfrm, 1, function(x) all( x == y) ) , ]

The identical function is probably too stringent, since it will check attributes as well.

> y=c(1,2,3)
> x = data.frame(a=1:10, b=2:11, c=3:12)
> identical(x[1,] , y)
[1] FALSE
IRTFM
  • 258,963
  • 21
  • 364
  • 487
0

Another approach, using base R:

df <- data.frame(a = c(1, 2), b = c("cat", "dog"))
any(df$a == 1 & df$b == "cat")
#> [1] TRUE
David Rubinger
  • 3,580
  • 1
  • 20
  • 29