Determining different rows between two data sets in R

Question

I have two data files in tab separated CSV format. The files are in the following format:

EP Code    EP Name    Address    Region    ...
101654    Alpha     York Street    Northwest    ...
103628    Beta    5th Avenue    South    ...

EP codes are unique. What I want to do is to compare two files with respect to EP codes, determine the different rows and write them into a new file.

For example, file1.csv has 800 rows and file2.csv has 850 rows. file2 could be a file completely including file1 plus 50 rows; or it could be file1 - 10 rows + 60 rows. I want to determine the differences between two data sets. I'm not interested in the mutual rows.

How can I do that in R?

Closely related: http://stackoverflow.com/questions/1837968/r-how-to-tell-what-is-in-one-vector-and-not-another. — Shane, Jun 28 '10 at 13:54

Shane · Accepted Answer · 2010-06-28T14:30:11.773

3

There are many ways to do this, including setdiff, intersect, the %in% function, is.element. Just find the intersecting set and exclude it using !:

diff1 <- file1[setdiff(file1$ep.code, file2$ep.code),]

or

diff2 <- file2[!(intersect(file2$ep.code, file1$ep.code)),]

edited Jun 28 '10 at 14:30

answered Jun 28 '10 at 13:39

Shane

98,550
35
224
217

Thanks for the answer. I think there should be an extra right-parenthesis before the last comma in your last code line. – Mehper C. Palavuzlar Jun 28 '10 at 14:25

Determining different rows between two data sets in R

1 Answers1

Linked