2

I have the two data frames -

  • Dataframe 1:
df1 <- read.table(text = "
Numbers
40500
4400025
14515
51432", header = TRUE)
  • Dataframe 2:
df2 <- read.table(text = "
ColumnA   ColumnB    ReturnC   ReturnD
32000       41400      rose      apple
51400       57389      orchid    orange
12523       16357      jasmine   grapes
10528       11034      lily      melon ", header = TRUE)

This is what I want -- Check if each value from Numbers in dataframe 1 is between ColumnA and ColumnB of dataframe 2. If it is, I want the corresponding ReturnC and ReturnD from dataframe 2 in the output.

If it is not, I want NA.

E.g. 51432 is in between column A and column B. Orchid and orange are the corresponding values in dataframe 2 for that intersection. This is how the final dataset will look like:

Final:

Numbers     Output1      Output2
40500        rose        apple
4400025      NA          NA
14515        jasmine     grapes 
51432        orchid      orange

How should I do this in R? -- Thank you for your help!

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
Rose
  • 29
  • 2

1 Answers1

4

You can use left_join + join_by from dplyr:

library(dplyr)

left_join(df1, df2, by = join_by(between(Numbers, ColumnA, ColumnB)))

#   Numbers ColumnA ColumnB ReturnC ReturnD
# 1   40500   32000   41400    rose   apple
# 2 4400025      NA      NA    <NA>    <NA>
# 3   14515   12523   16357 jasmine  grapes
# 4   51432   51400   57389  orchid  orange
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
  • 1
    Note that `join_by` was added in dplyr version 1.1.0 on Jan 29, 2023, so you might have to update to make this work – divibisan Feb 15 '23 at 17:02
  • 1
    Why is your last value in ColumnA 51400, in the example of OP it is 10528 – TarJae Feb 15 '23 at 17:17
  • 1
    @TarJae `left_join` keeps the order of the first data (i.e. `df1`). ColumnA and ColumnB are from `df2` and are joined to `df1`, so their order changed. The last `Numbers` is `51432`, which is between `51400` and `57389` from `df2`. That's reasonable. – Darren Tsai Feb 15 '23 at 17:24
  • Could you please: Check this `bind_cols(df1, df2) %>% mutate(across(starts_with("Return"), ~ifelse(between(Numbers, ColumnA, ColumnB), ., NA)))`. ? It is not correct. – TarJae Feb 15 '23 at 17:26
  • 1
    @TarJae What you do is row-wise comparison. But I think the OP want cross comparison. `df1` and `df2` are not necessary to have equal number of rows, so `bind_cols(df1, df2)` does not make sense. Your code fails in the 4th row because 51432 (4th `Numbers`) is not between 10528 (4th `ColumnA`) and 11034 (4th `ColumnB`). – Darren Tsai Feb 15 '23 at 17:40