Merge files of different lengths in r based on two variables (one match and one comparison)

Question

I have two files that I want to merge. The first file has several variables; the two important ones are ID and END. ID is not unique to a row; more than one row can have the same id. The second file has two variables, ID and START. In this file, each id is unique--there are no rows with the same id.

I want to create a third file. I want to keep rows from the first file that have an ID matching one of the IDs in the second file AND where the END in the first file is less than START from the second file. I only want to keep rows from the first file that match the criteria. I do not want to keep START value from the second file. Nor do I want to keep rows that do not match an ID and do not have END < START.

FILE 1  
ID    END       
1     333            
2     555           
3     789           
4     234



File 2
ID     START
 1       432
 2       777  
 3       444


New FILE
ID    END
1     333
2     555

Any help is appreciated.

I think you have an error in your examle, `start` in row 2 is larger than `end` in row 2. I'm almost done with the code I should have an answer in a bit — Adam, Jul 20 '16 at 22:14
Merge on ID column, then filter. Read about merge [here](http://stackoverflow.com/questions/1299871). — zx8754, Jul 20 '16 at 22:18

score 1 · Accepted Answer · answered Jul 20 '16 at 22:37

1

Assuming that file3 is the final one you want:

file3 = merge(file1, file2, by = "ID")
file3 = file3[file3$END < file3$START, c("ID","END")]

answered Jul 20 '16 at 22:37

Judu Le

81
7

score 0 · Answer 2 · answered Jul 20 '16 at 22:18

0

Lets assume you have df1 for file1 and df2 for file2 and you want to make df3 for your last output.

rows <- df1$Id %in% df2$ID
df3 <- df1[rows]
df3 <- df1[df1$End > df2$End]

answered Jul 20 '16 at 22:18

Adam

648
6
18

Merge files of different lengths in r based on two variables (one match and one comparison)

2 Answers2