0

I have many look-up tables in my R, which have the following format:

lookup <- data.frame('value' = c(0,50),
                 'return' = c("0 to 50","over 50"))
  value  return
1     0 0 to 50
2    50 over 50

My main dataset looks like this:

main_df <- data.frame('somecolumn'=c(10,20,100,30))

  somecolumn
1         10
2         20
3        100
4         30

I need to join the two datasets using a left join with the main_df on the left, and the lookup table on the right. The match should be approximate. That is, values with a value between 0 to 50 should match to the first index of the lookup table, and those above 50 should match to the 2nd index. My actual lookup tables are much larger than this.

The result should be:

  somecolumn  return
1         10 0 to 50
2         20 0 to 50
3        100 over 50
4         30 0 to 50

Python can do this using the pd.merge_asof (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_asof.html). In excel, this can be done using VLOOKUP with TRUE as the final parameter.

How can I do this in a one line function in R? The reason I need to be one line is because I have many lookup tables, and this needs to be applied a lot of times.

I think this can be done using cut in R, which will remove the need for lookup, but I am not sure how to get it to output a character with the bins I require

user33484
  • 740
  • 2
  • 9
  • 36
  • 2
    Possible duplicate of [dplyr left\_join by less than, greater than condition](https://stackoverflow.com/questions/37289405/dplyr-left-join-by-less-than-greater-than-condition) – JasonAizkalns Sep 11 '19 at 20:24
  • 1
    Using `cut` you could do something like `within(main_df, return <- cut(somecolumn, c(0, 50, Inf)))` – dave-edison Sep 11 '19 at 20:33
  • @DiceboyT yep that does it, then I can just change my lookup tables to be just – user33484 Sep 11 '19 at 20:35

2 Answers2

3

The simplest solution that works for an arbitrary number of intervals (conditions):

ind <- findInterval(main_df$somecolumn, lookup$value, left.open = TRUE)
main_df$return <- lookup$return[ind]
slava-kohut
  • 4,203
  • 1
  • 7
  • 24
0

The most straightforward way that comes to my mine is using the ifelse function from the tidyverse

main_df <- main_df %>% 
  mutate(return = if_else(somecolumn < 50, "0 to 50", "over 50"))

You can exapnd the if as your data grows. See this and this

An alternative solution (edit)

main_df <- main_df %>% mutate(somecolumn2 = somecolumn/100) %>% 
  mutate(somecolumn2 = round(somecolumn2)) %>% 
  mutate(somecolumn2 = somecolumn2*100) %>% 
  mutate(somecolumn2 = somecolumn2/2) %>% 
  left_join(lookup, by = c("somecolumn2" = "value")) %>% 
  select(-somecolumn2)
Community
  • 1
  • 1
Orlando Sabogal
  • 1,470
  • 7
  • 20
  • Thanks! I think the only issue with this is my lookups can be hundreds of rows so the if_else would get too long to be readable. – user33484 Sep 11 '19 at 20:36
  • 1
    Then you could try the **left_join()** function (tidyverse). Though as you need an exact match you have to perform some transformation to your data. For example, something like divide by 100, **round()** the value and multiply by 100 is way simple way to see where the number is. 70 will return 100 and 20 will return 0. And that you your left join should work. – Orlando Sabogal Sep 11 '19 at 20:42