0

enter image description here

I am trying to join to a lookup table based on the field value of my base table "Score" being between two field values in the lookup table ("ATTRIBUTE_2" and "ATTRIBUTE_3").

Basically the R version of the following SQL:

SELECT base.PolicyNo
    ,base.Score
    ,lkp.value ScoreFactor
FROM base base
INNER JOIN lkp lkp
    ON base.PERIL = lkp.PERIL
    AND base.COVERAGE = lkp.COVERAGE
    AND base.LocStCd = lkp.ATTRIBUTE_1
    AND base.Score > lkp.ATTRIBUTE_2
    AND base.Score <= lkp.ATTRIBUTE_3

I'm very new to R and still trying to grasp a lot of the syntax, so please be detailed in your responses :)

I've tried

TIERLKP[DF1, on = c("PERIL","COVERAGE",ATTRIBUTE_1 = "LocStCd",ATTRIBUTE_2 < "Score",ATTRIBUTE_3 >= "TOTALTIERSCOREVA"), nomatch=NA]

and R tells me that ATTRIBUTE_2 and ATTRIBUTE_3 can't be found

I also tried

u = TIERLKP[DF1, on = c("PERIL","COVERAGE",ATTRIBUTE_1 = "LocStCd"), nomatch=NA]
    u = u[u$ATTRIBUTE_2 < u$Score,]
    u = u[u$ATTRIBUTE_3 >= u$Score,]

Which results in massive duplication of rows and the need for a cartesian join...doesn't seem efficient.

Per Phil's comment I tried

left_join(DF1,TIERLKP, by = c("PERIL","COVERAGE","LocStCd" = "ATTRIBUTE_1","Score" > "ATTRIBUTE_2", "Score" <= "ATTRIBUTE_3"),copy = TRUE)

with/without quotes as well as changing the order but am still getting "Error: Join columns must be present in data."

jrcalabrese
  • 2,184
  • 3
  • 10
  • 30
kmoore
  • 11
  • 2
  • The `dplyr` package has a set of `*_join` functions that are inspired by the SQL equivalents. See https://dplyr.tidyverse.org/reference/mutate-joins.html – Phil Dec 21 '22 at 17:27
  • 2
    Can you make your post [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) provide your data using `dput()`? Please avoid posting images of data or code. – jrcalabrese Dec 21 '22 at 19:03

1 Answers1

0

We may use the data.table syntax as below. Also, convert to data.table if the data is data.frame with setDT

library(data.table)
setDT(TIERLKP)
setDT(DF1)
TIERLKP[DF1, on = .(PERIL, COVERAGE,ATTRIBUTE_1 = LocStCd,
   ATTRIBUTE_2 < Score,
   ATTRIBUTE_3 >= TOTALTIERSCOREVA), nomatch=NA] 

In the devel version of dplyr, we can use join_by which does the non-equi join

library(dplyr)
inner_join(TIERLKP, DF1, by = join_by(PERIL, COVERAGE,
      ATTRIBUTE_1 = LocStCd,   
       ATTRIBUTE_2 < Score,
   ATTRIBUTE_3 >= TOTALTIERSCOREVA))
akrun
  • 874,273
  • 37
  • 540
  • 662