1

A post with the inverse problem has been asked before, but the same function does not apply.

Consider these two databases.

df1<- data.frame(TypeA=paste0("A",1:3), Position=c(1, 5, 9))
df2<- data.frame(TypeB=paste0("B",1:5),StartPosition=c(0, 3, 7, 8,  20), EndPosition=c(2, 4, 8, 18, 25))

How can I obtain the rows in df2 whose range (between StartPosition and EndPosition) includes the value (Position) in df1?

One would expect:

TypeB   StartPosition EndPosition  TypeA
B1      0             2            A1
B4      8             18           A3
Camilo
  • 153
  • 7

3 Answers3

1

I found a likely solution using the package sqldf:

Result= sqldf("select df1.*, df2.* from df1, df2 
  WHERE df1.Position between df2.StartPosition AND df2.EndPosition")
Camilo
  • 153
  • 7
1

Using join_by():

library(dplyr)

inner_join(df1, df2, join_by(between(Position, StartPosition, EndPosition)))
Mark
  • 7,785
  • 2
  • 14
  • 34
  • I get an error saying function does not exits... it looks like join_by is only available in the devel version... https://stackoverflow.com/questions/74141591/dplyr-non-equi-join-no-longer-accepting-join-by-argument – Camilo Sep 01 '23 at 03:19
  • No, that was from a year ago. Try running install.packages("dplyr") to update your dplyr! – Mark Sep 01 '23 at 03:48
  • 1
    This join_by is a very nice/useful hidden gem/function..came to learn...thank you @Mark – Camilo Sep 01 '23 at 17:59
  • @Camilo check out my answer from a while back I linked to! It has more details on it there – Mark Sep 01 '23 at 19:53
0

If you want to perform the operation rowise, you can do this:

df1 <- df1 %>% mutate(n = 1:n())
df2 <- df2 %>% mutate(n = 1:n())

df2 %>% 
  left_join(df1, by = "n") %>% 
  filter(between(Position, StartPosition, EndPosition))
  • but this assumes that the two databases also need to have in common the column you just created (n), can this work considering only the positions. – Camilo Sep 01 '23 at 02:00