Check whether a row with values belongs to a data frame in R

Question

Possible Duplicate:
Existing function for seeing if a row exists in a data frame?

Suppose I have the following data frame in R.

df = data.frame('a'=c(1:3), 'b'=c(4:6))

This data frame contains three rows: (1,4), (2,5) and (3,6). Suppose I did not know which rows df contains and wanted to check whether a row (1,4) belongs to it, how can I check that?

My actual case involves comparison of 27 parameter values. Is there a solution in which I can do this without typing each and every parameter name? Thanks!

The reason I want to do this is that I have an R dataset called masterdata which contains simulation data. I want to update this data set with new data that is obtained as I make additional simulation runs with different parameter combinations. It is possible, however, that I may forget that I have run the simulation for a certain parameter combination and may run it again, in which case, the masterdata will be expanded with duplicate values. I can later go and remove these duplicate values, but I would not want the whole process of updating the data set to go through if the values are duplicate. For this I need to check if the data from a simulation run is already present in the masterdata. I can do this if I know how to check whether a given row belongs to the masterdata.

Thanks.

You might find some ideas in this earlier question: [Existing function for seeing if a row exists in a data frame?](http://stackoverflow.com/questions/5916854/existing-function-for-seeing-if-a-row-exists-in-a-data-frame) — Marek, Jun 13 '11 at 09:21
There are two solutions there, one by you (which is similar to the one here) and one by Hadley. Is one faster than the other? Thanks. — Curious2learn, Jun 13 '11 at 11:34
@Curious2learn I think it's depends on data: number of rows, number of columns and types of columns. — Marek, Jun 13 '11 at 11:59
@Curious2learn I run some tests and it seems that Hadley's is much faster (for wide data.frame ~3x faster). — Marek, Jun 13 '11 at 12:27
I vote for reopen -- the true aim of the OP is remove duplicated rows, so this is a different question than the previous one. — mbq, Jun 13 '11 at 17:39

score 6 · Answer 1 · answered Jun 13 '11 at 02:15

6

There may be more efficient ways, but I think

tail(duplicated(rbind(masterdata,newvals)),1)

will do it: in other words, attach the new row to the end of the data frame and see whether it is duplicated or not.

answered Jun 13 '11 at 02:15

Ben Bolker

211,554
25
370
453

[I agree with your answer](http://stackoverflow.com/questions/5916854/existing-function-for-seeing-if-a-row-exists-in-a-data-frame/5917042#5917042) ;) – Marek Jun 13 '11 at 09:31

score 2 · Answer 2 · answered Jun 13 '11 at 02:06

2

If you want to compare only two columns in the data.frame, then this does a trick:

> which(df$a+df$b*1i == 1+4i)
[1] 1

This may or may not be faster than other vectorized solution.

answered Jun 13 '11 at 02:06

kohske

65,572
8
165
155

score 1 · Answer 3 · answered Jun 13 '11 at 01:59

1

Quite a few ways to do this. You can use ifelse() which is a vectorized solution to return a boolean value for each row of your dataframe if it meets your conditions.

> with(df, ifelse(a == 1 & b == 4, 1, 0))
[1] 1 0 0

Since you are probably only interested in knowing whether your parameter combination has been run at all, you can wrap sum() around the previous command:

> sum(with(df, ifelse(a == 1 & b == 4, 1, 0)))
[1] 1

Another alternative is to use nrow() and subset(). We'll again use the & operator for our testing:

> nrow(subset(df, a == 1 & b == 4))
[1] 1

answered Jun 13 '11 at 01:59

Chase

67,710
18
144
161

My actual case involves comparison of 27 parameter values. Is there a vectorized solution so that I do not have to type each and every parameter name? Thanks! – Curious2learn Jun 13 '11 at 02:17
@Curious2learn - see @Ben's answer for the path to enlightenment. He's steering you in the right direction there. – Chase Jun 13 '11 at 11:27

score -1 · Answer 4 · answered Jun 13 '11 at 08:02

-1

You don't need any more than a single unique call:

Test<-data.frame(a=c(1,2,2,2,3),b=c(1,2,2,3,3),c=(1,2,2,2,3))
Test
unique(Test) #Same with duplicated rows removed

answered Jun 13 '11 at 08:02

mbq

18,510
6
49
72

1

That is not close to what OP asked :( – agoldev Feb 06 '17 at 18:01

Check whether a row with values belongs to a data frame in R

4 Answers4

Linked