Existing function for seeing if a row exists in a data frame?

Question

Is there an existing function for determining whether a row exists within a data frame? I suppose could do an apply/identical, but it seems like I'm missing something.

For example:

given such a data frame:

  a   b
1 1 cat
2 2 dog

Is there an existing function which will allow me to test whether the row (1, cat) exists in the data frame?

Thanks, Zach

score 25 · Answer 1 · answered May 07 '11 at 00:10

25

Try match_df from plyr (using Marek's sample data):

library(plyr)
X <- data.frame(a=1:2, b=c("cat","dog"))
row_to_find <- data.frame(a=1, b="cat")

match_df(X, row_to_find)

answered May 07 '11 at 00:10

hadley

102,019
32
183
245

Hmmm. I can't seem to find match_df in the plyr library. – Zach May 09 '11 at 16:14
Do you have the latest version? – hadley May 09 '11 at 16:48
great answer...thanks. Should be the accepted solution imho. – tumultous_rooster Aug 14 '15 at 19:41

score 8 · Accepted Answer · answered May 07 '11 at 15:18

8

For data from @Marek answer.

nrow(merge(row_to_find,X))>0 # TRUE if exists

answered May 07 '11 at 15:18

Wojciech Sobala

7,431
2
21
27

For me, this is also the fastest! – agoldev Feb 06 '17 at 17:23
2

This can be dangerous. Make sure the row_to_find is a `data.frame`. Otherwise it will not look for matching the entire row. It works on the sample data by @Marek, as stated. But once you enter c(1,"cat") as the row_to_find it will return always true. – agoldev Jul 28 '17 at 17:40

score 7 · Answer 3 · answered May 06 '11 at 21:10

7

Taking your example:

X <- data.frame(a=1:2, b=c("cat","dog"))
row_to_find <- data.frame(a=1, b="cat") # it has to be data.frame (not a vector) to hold different types

Then

duplicated(rbind(X, row_to_find))[nrow(X)+1]

gives you answer.

answered May 06 '11 at 21:10

Marek

49,472
15
99
121

Thanks. I imagine duplicated might run a touch slower than apply though. – Zach May 06 '11 at 21:17
3

I'd expect duplicated to be considerably faster than apply – hadley May 07 '11 at 00:10
It is quite a bit faster. Off the top of my head I was thinking n vs n^2 comparisons. Thanks. – Zach May 09 '11 at 16:13

score 1 · Answer 4 · edited May 23 '17 at 11:46

1

I suggest Ben Bolker's solution since nrow(merge(row_to_find,X))>0 solution doesn't work for me (always give TRUE) :

tail(duplicated(rbind(X,row_to_find)),1)>0

edited May 23 '17 at 11:46

Community

1
1

answered Mar 26 '17 at 14:30

Laurent Camus

11
3

score 1 · Answer 5 · answered Feb 09 '18 at 16:32

For fans of dplyr and the tidyverse, you can use dplyr:anti_join(). According to its documentation, dplyr::anti_join(x, y) "returns all rows from x where there are not matching values in y, keeping just columns from x." Hence for dplyr::anti_join(row, df) the result has zero rows, then row was indeed in df, if it has one row, then row was not in df.

library(dplyr)

df <- tribble(~a, ~b,
              1,  "cat",
              2,  "dog")
#> # A tibble: 2 x 2
#>       a b    
#>   <dbl> <chr>
#> 1  1.00 cat  
#> 2  2.00 dog

row <- tibble(a = 1, b = "cat")
#> # A tibble: 1 x 2
#>       a b    
#>   <dbl> <chr>
#> 1  1.00 cat

nrow(anti_join(row, df)) == 0  # row is in df so should be TRUE
#> Joining, by = c("a", "b")
#> [1] TRUE

row <- tibble(a = 3, b = "horse")
#> # A tibble: 1 x 2
#>       a b    
#>   <dbl> <chr>
#> 1  3.00 horse

nrow(anti_join(row, df)) == 0  # row is not in df so should be FALSE
#> Joining, by = c("a", "b")
#> [1] FALSE

score 0 · Answer 6 · answered May 06 '11 at 21:03

0

For vector, y, with same number of elements as columns in dataframe, dfrm:

apply(dfrm, 1, function(x) all( x == y) )

Should return a vector of TRUE and FALSE which could in turn be used as an index in [,]

dfrm[ apply(dfrm, 1, function(x) all( x == y) ) , ]

The identical function is probably too stringent, since it will check attributes as well.

> y=c(1,2,3)
> x = data.frame(a=1:10, b=2:11, c=3:12)
> identical(x[1,] , y)
[1] FALSE

answered May 06 '11 at 21:03

IRTFM

258,963
21
364
487

Thanks. So no existing function? It seems like a pretty common problem. – Zach May 06 '11 at 21:07
You could use merge. With Marek's example, try: `merge(x, row_to_find, 1,1)` – IRTFM May 06 '11 at 21:47
`all(x == y)` will be buggy because it will coerce x and y to be the same type. – hadley May 07 '11 at 00:09

score 0 · Answer 7 · answered Jul 02 '21 at 16:17

0

Another approach, using base R:

df <- data.frame(a = c(1, 2), b = c("cat", "dog"))
any(df$a == 1 & df$b == "cat")
#> [1] TRUE

answered Jul 02 '21 at 16:17

David Rubinger

3,580
1
20
29

Existing function for seeing if a row exists in a data frame?

7 Answers7

Linked

Related