I have two data frames, one of which has a large list of two identifiers:
rsid uniq_id
rs796086906 1_13868_G_A
rs546169444 1_14464_T_A
rs6682375 1_14907_G_A
rs6682385 1_14930_G_A
And one which contains one of the two identifiers:
V1 V2 V3 V4 V5 V6
1 1_10439_A_AC 0 10439 A AC
1 1_13417_CGAGA_C 0 13417 C CGAGA
1 1_14907_G_A 0 14907 G A
What I want is to replace the ID in the second dataframe with the corresponding second ID from the first dataframe (I also couldn't think of a succinct way of phrasing that for the title of this question hence why it's phrased so awkwardly and why I might not have been able to find duplicates). I.e.:
V1 V2 V3 V4 V5 V6
1 1_10439_A_AC 0 10439 A AC
1 1_13417_CGAGA_C 0 13417 C CGAGA
1 rs6682375 0 14907 G A
My solution at present is to use a for ... if
loop as follows:
for (x in 1:nrow(df2)){
if (df2$V2[x] %in% df1$uniq_id){
df2$V2[x] = df1$rsid[x]
}
}
However, because both files are extremely large, I believe that this is likely a very inefficient way of doing this and am wondering if there is a faster method.
Someone suggested that using the match()
function might be quicker, but given that the R documentation for this suggests that %in%
is actually more intuitive and my inexperience with it, I'm not sure how to apply it in a different way.
Any help appreciated.