6

I have two data frames (df1 and df2) in R with different information except for two columns, a person number (pnr) and a drug name (name). Row for row in df1, I want to check, if the combination of pnr and name exists somewhere in df2. If this combination exists, I want "yes" in a another column in df1. If not a "no".

df1
pnr|drug|...|check
---|----|---|-----
1  | 1  |...| no
1  | 2  |...| yes
2  | 2  |...| yes
3  | 2  |...| no
.....

df2
pnr|drug|...|
---|----|---|
1  | 2  |...|
2  | 2  |...|
....

For example, I want check, if the row combination pnr=1 & drug=1 exists in df2 (no), pnr=1 & drug=2 (yes) etc. And then place a "yes" or "no" in the check column in df1

I have tried the following for statement without luck. It does place a "yes or "no" in the "check" column, but it doesn't do it correctly

for(index in 1:nrow(df1)){
   if((df1[index,]$pnr %in% df2$pnr)&(df1[index,]$name %in% df2$name)){
   check_text="yes"}else{check_text="no"}
   df1$check=check_text
}

I have a sense that I should be using apply, but I haven't been able to figure that out. Do any of you have an idea how to solve this?

Frank
  • 2,386
  • 17
  • 26
reuss
  • 121
  • 1
  • 1
  • 9

3 Answers3

11

One way is using base R methods.

Pasting the columns pnr and drug together and finding a similar match in df1

df1$check <- ifelse(is.na(match(paste0(df1$pnr, df1$drug), 
                                        paste0(df2$pnr, df2$drug))),"No", "Yes")

#  pnr drug check
#1   1    1    No
#2   1    2   Yes
#3   2    2   Yes
#4   3    2    No
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
8

This is natural for dplyr::left_join:

library(dplyr) # for left_join, transmute
library(tidyr) # for replace_na

df1 <- expand.grid(pnr = 1:3, drug = 1:3)
df2 <- data.frame(pnr = c(1, 3), drug = c(2, 1))

df1 <- df1 %>%
  left_join(df2 %>% transmute(pnr, drug, check = 'yes')) %>%
  replace_na(list(check = 'no'))

df1

#>   pnr drug check
#> 1   1    1    no
#> 2   2    1    no
#> 3   3    1   yes
#> 4   1    2   yes
#> 5   2    2    no
#> 6   3    2    no
#> 7   1    3    no
#> 8   2    3    no
#> 9   3    3    no
4

We could use apply, checking for matches using the any function:

df1$check <- 
    apply(df1, 1, function(x) 
    ifelse(any(x[1] == df2$pnr & x[2] == df2$drug), 'yes','no'))

# df1

#    pnr drug check
# 1   1    1    no
# 2   1    2   yes
# 3   2    2   yes
# 4   3    2    no

data

df1 <- data.frame(pnr = c(1,1,2,3),
                  drug = c(1,2,2,2))

df2 <- data.frame(pnr = c(1,2),
                  drug = c(2,2))
bouncyball
  • 10,631
  • 19
  • 31