0

I have two data files in tab separated CSV format. The files are in the following format:

EP Code    EP Name    Address    Region    ...
101654    Alpha     York Street    Northwest    ...
103628    Beta    5th Avenue    South    ...

EP codes are unique. What I want to do is to compare two files with respect to EP codes, determine the different rows and write them into a new file.

For example, file1.csv has 800 rows and file2.csv has 850 rows. file2 could be a file completely including file1 plus 50 rows; or it could be file1 - 10 rows + 60 rows. I want to determine the differences between two data sets. I'm not interested in the mutual rows.

How can I do that in R?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Mehper C. Palavuzlar
  • 10,089
  • 23
  • 56
  • 69
  • Closely related: http://stackoverflow.com/questions/1837968/r-how-to-tell-what-is-in-one-vector-and-not-another. – Shane Jun 28 '10 at 13:54

1 Answers1

3

There are many ways to do this, including setdiff, intersect, the %in% function, is.element. Just find the intersecting set and exclude it using !:

diff1 <- file1[setdiff(file1$ep.code, file2$ep.code),]

or

diff2 <- file2[!(intersect(file2$ep.code, file1$ep.code)),]
Shane
  • 98,550
  • 35
  • 224
  • 217