I have two R data frames, each have a group (one of 25 strings) and a position column (an integer). I would like to check for each entry of data frame A whether there is an entry in data frame B which is part of the same group and the delta between its position integer is smaller than 500. If, so I want to tick that entry in data frame A.
So for example the first entry in A would match the third entry in B (The group is the same and the difference between their positions is smaller than 500 bp). Therefore, it got marked in the output table.
head(A)
group pos
1 chr1 3202965
2 chr1 3000168
3 chr1 3000204
4 chr2 3000560
5 chr2 3000674
6 chr3 3000698
head(B)
group pos
1 chr1 3180137
2 chr1 3200918
3 chr1 3202983
4 chr1 3309167
5 chr4 3458278
6 chr1 4249136
A_out <- magic(A,B)
head(A_out)
group pos out
1 chr1 3202965 +
2 chr1 3000168 -
3 chr1 3000204 -
4 chr2 3000560 -
5 chr2 3000674 -
6 chr3 3000698 -
My intuition would be a nested loop (first A then B) and check for each entry combination whether it matches. But my data frames a rather big (12052773 and 44459 entries respectively) and this would never finish.
Is there a smarter approach to handle this?