2

I have a two dataframes, where I want to compare pairs of dataframe b, to the pairs of dataframe a, and see if the pairs from b fall within (inclusive) the pairs/range of those in a. For instance, see below:

df_1 <- data.frame(x= c(-82.38319, -82.38318, -82.40397, -82.40417, -82.40423), 
                y= c(29.61212, 29.61125, 29.61130, 29.61134, 29.61167))
#Output:
#       x        y
# 1 -82.38319 29.61212
# 2 -82.38318 29.61125
# 3 -82.40397 29.61130
# 4 -82.40417 29.61134
# 5 -82.40423 29.61167

df_2 <- data.frame(o= c(-82.38320,-82.38317,-82.40397,-82.40416,-82.40424), 
                t= c(29.61212, 29.6114, 29.61130, 29.61133, 29.61167))
#Output:
#        o        t
# 1 -82.38320 29.61212
# 2 -82.38317 29.61140
# 3 -82.40397 29.61130
# 4 -82.40416 29.61133
# 5 -82.40424 29.61167

#made this dataframe as an example only.
desired_output <- data.frame(lat= df_2$o, lon= df_2$t, exists= c(NA, "YES","YES","YES",NA))
#Output I seek:
#       lat      lon    exists
# 1 -82.38320 29.61212   <NA> 
# 2 -82.38317 29.61140    YES
# 3 -82.40397 29.61130    YES
# 4 -82.40416 29.61133    YES
# 5 -82.40424 29.61167   <NA>

#explanation:
#1- even though 82.38320 is OK & is in rows 3,4,5 in df_1, 29.61212 is out of bounds with their co-pairings.
#2- row 2 of df_2 is within the row 5 of df_1.
#3- row 3 of df_2 matches to row 3 of df_1 thus inclusive
#4- row 4 pair matches and its co_pair is less than those pair of row 4 in df_1
#5- This pair at row 5 is out of bounds in all of the rows of df_1

#Column "exists" can be appended to dataframe b, result matters only, neatness is not an issue.

I have done digging around in Stack Overflow, got nothing but this listing. But this person was comparing single value with pairs, not pairs to pairs or pairs within pairs. I have done cbind to both dataframe and compare using that. But I failed with that.

What can I try next?

halfer
  • 19,824
  • 17
  • 99
  • 186
CaseebRamos
  • 684
  • 3
  • 18
  • 1
    What is the rule for comparison? Why row 2 is "YES" and row 1 as NA. – Ronak Shah Mar 08 '20 at 03:27
  • Thank you Ronak, rule is that (a2,b2) from dataframe df_2 has to be either less than both a1 & b1 or equals to a1 & b1 or either entity of pair from (a2,b2) can be equal but it's co-pair must match, or has to be small from pairs of the dataframe df_1. My terminology bit weak, I can explain further. The reason why 2 is YES, because pair 2 of df_2 is smaller than both (thus is inside) of pair 5 of df_1. And pair 1 of df_2 is NA because it didn't match the criteria I mentioned. Meaning (a2,b2) were mismatch and one of its co-pair was out of bounds. Sorry for edits. – CaseebRamos Mar 08 '20 at 03:52

2 Answers2

3

We can use mapply to compare o and t values of df_2 with df_1 and check if any value is the range and assign "YES" or NA accordingly.

df_2$exists <- c(NA, "YES")[mapply(function(x, y) 
                            any(df_1$x <= x & df_1$y >= y), df_2$o, df_2$t) + 1]

df_2
#           o        t exists
#1 -82.38320 29.61212   <NA>
#2 -82.38317 29.61140    YES
#3 -82.40397 29.61130    YES
#4 -82.40416 29.61133    YES
#5 -82.40424 29.61167   <NA>
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

We can use a non-equi join in data.table

library(data.table)
setDT(df_2)[df_1, exists := "YES", on = .(o >= x, t < y), mult = 'first']
akrun
  • 874,273
  • 37
  • 540
  • 662