2

I'm trying to eliminate from a first dataframe all of the rows for which a certain value is in a second dataframe.

Using the R programming language for statistical data analysis.

This is the first question I post here, so bear with me if you please ;)

I work with confidential data, so I recreated the problem with an example.

Name=c("Bussieres", "Nelson")
Fname=c("Paul", "Robert")
Tel=c(123,234)
comp1=data.frame(Name, Fname, Tel)

Name=c("Bussieres","Bussieres","Nelson","Nelson")
Fname=c("Robert","Paul","Paul","Paula")
Tel=c(123,234,345,456)
comp2=data.frame(Name, Fname, Tel)

comp1 returns:

   Name Fname Tel
1 Bussieres   Paul 123
2    Nelson Robert 234

comp2 returns:

   Name Fname Tel
1 Bussieres Robert 123
2 Bussieres   Paul 234
3    Nelson   Paul 345
4    Nelson  Paula 456

Now, what I want is to return the rows of comp1 for which "Name" and "Fname" are not identical in comp2.

The expected return, to be stored in a new dataframe comp3, would be (slight edit done here, posted erronous expected results):

   Name Fname Tel
1    Nelson Robert 234

My first attempts were with using the match function, but that didn't quite work.

The following attempt at a for loop also didn't work.

for (i in comp1[,"Name"]){for (j in comp3[,"Name"]){if i!=j return comp3=x1["Name"==i,]}}

I'm surprised that I can't find basic (primitive) functions in R to do this, as excluding certain observations from a data set would be a very routine procedure.

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
  • Have you tried this? http://www.cookbook-r.com/Manipulating_data/Comparing_data_frames/ and this is a similar problem: http://stackoverflow.com/questions/3171426/compare-two-data-frames-to-find-the-rows-in-data-frame-1-that-are-not-present-in – Ben Feb 02 '13 at 01:02

1 Answers1

6

A data.table solution:

require(data.table)
dt1 <- data.table(comp1, key=c("Name", "Fname"))
dt2 <- data.table(comp2, key=c("Name", "Fname"))
dt1[!dt2]

#      Name  Fname Tel
# 1: Nelson Robert 234
Arun
  • 116,683
  • 26
  • 284
  • 387
  • Sorry, I meant different, so I'll correct the output in the question. – Gabriel Bergevin-Estable Feb 02 '13 at 00:09
  • The answer proposed by Arun worked. Thanks! I installed the data.table package, and run the above script. To eliminate all errors from data collection, I also capitalised the Name and Fname values. I did this via Excel however, so I'll be looking for a solution in R to this issue at another time ;) – Gabriel Bergevin-Estable Feb 02 '13 at 21:47