5

I have two csv files.

File one has two columns:

DD1 abct
DD2 geate
DD3 watec
DD4 wwwca21
DD5 bate789

File two has one column:

abct
geate
bate789

I want to get a truncated file one to include those that matched with file two, i.e.

DD1 abct
DD2 geate
DD5 bate789

Could you mind to let me know how to do it with R?

New to R.

joran
  • 169,992
  • 32
  • 429
  • 468
newror
  • 53
  • 2
  • 4
  • 1
    The correct term for this is an "inner join". Calling it "truncating" in the title is not going to help the future search efforts. – IRTFM Aug 25 '11 at 16:27

2 Answers2

7

First, read the files with the read.table:

file1 <- read.table("file1.csv", col.names=c("FOO", "BAR"))
file2 <- read.table("file2.csv", col.names=c("BAR"))

Then merge them:

merged <- merge(file1, file2)

And write the result:

write.table(merged, "merged.csv")
Anatoliy
  • 1,350
  • 9
  • 9
  • Two more points: for more sophisticated operations, see: http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right – Anatoliy Aug 25 '11 at 08:54
  • And for really large files, R might not be the best option, command line utils like `join` (or `awk` / `sed`) will be faster, I think. – Anatoliy Aug 25 '11 at 08:56
0

This is a direct way to do it by doing %in%. This will be about the fastest way entirely within R.

read in the files

datf1 <- read.table("file1.csv") #two column file
datf2 <- read.table("file2.csv") #one column file

select the rows you want... %in% makes a logical vector that is the length of the first argument and is TRUE when an item is in both of the arguments and FALSE otherwise.

datf1 <- datf1[datf1[,2] %in% datf2[,1],]

write it out... I changed the file name from file1 because you really shouldn't overwrite your original data and make a new file1.

write.table(datf1, "file3.csv", sep = ',', row.names = FALSE, quote = FALSE)
John
  • 23,360
  • 7
  • 57
  • 83