0

I have the below toy dataset which is representative of a much larger data. However, these are the columns of importance. I'm attempting to check whether the values in Dataframe match the reference dataframes Reference_A, Reference_B, and Reference_C.

DataFrame

group   type    value
x       A       Teddy
x       A       William
x       A       Lars
y       B       Robert
y       B       Elsie
y       C       Maeve
y       C       Charlotte
y       C       Bernard


Reference_A

type    value
A       Teddy
A       William
A       Lars

Reference_B

type    value
B       Elsie
B       Dolores

Reference_C

type    value
C       Maeve
C       Hale
C       Bernard

Desired output:

group   type    value      check
x       A       Teddy      TRUE
x       A       William    TRUE
x       A       Lars       TRUE
y       B       Robert     FALSE
y       B       Elsie      TRUE
y       C       Maeve      TRUE
y       C       Charlotte  FALSE
y       C       Bernard    TRUE

I posted a similar question here, but realize that TRUE and FALSE's might be more effective to check: Check if values of one dataframe exist in another dataframe in exact order. I don't think that order matters, since I can manipulate my data so that all values are unique.

psychcoder
  • 543
  • 3
  • 14

1 Answers1

0

You can combine the "Reference" dataframes into one dataframe and join it with DataFrame by type, for each type and value you can then check if any value matches.

library(dplyr)

mget(paste0('Reference_', c('A', 'B', 'C'))) %>%
   bind_rows() %>%
   right_join(DataFrame, by = 'type') %>%
   group_by(group, type, value = value.y) %>%
   summarise(check = any(value.x == value.y))


#  group type  value     check
#  <chr> <chr> <chr>     <lgl>
#1 x     A     Lars      TRUE 
#2 x     A     Teddy     TRUE 
#3 x     A     William   TRUE 
#4 y     B     Elsie     TRUE 
#5 y     B     Robert    FALSE
#6 y     C     Bernard   TRUE 
#7 y     C     Charlotte FALSE
#8 y     C     Maeve     TRUE 

data

Reference_A <- structure(list(type = c("A", "A", "A"), 
value = c("Teddy", "William", "Lars")), class = "data.frame", 
row.names = c(NA, -3L))

Reference_B <- structure(list(type = c("B", "B"), value = c("Elsie", "Dolores")), 
class = "data.frame", row.names = c(NA, -2L))

Reference_C <- structure(list(type = c("C", "C", "C"), value = c("Maeve", "Hale", 
"Bernard")), class = "data.frame", row.names = c(NA, -3L))

DataFrame <- structure(list(group = c("x", "x", "x", "y", "y", "y", "y", "y"), 
type = c("A", "A", "A", "B", "B", "C", "C", "C"), value = c("Teddy", 
"William", "Lars", "Robert", "Elsie", "Maeve", "Charlotte", "Bernard"
)), class = "data.frame", row.names = c(NA, -8L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • thank you! However, `group` is lost in this example. I'd like to retain this for ID-ing values in my other analyses. Do you have how I can adapt your code to keep `group` as an identifier? – psychcoder Jul 28 '20 at 02:37
  • You can add that in `group_by`. See updated answer. – Ronak Shah Jul 28 '20 at 02:40
  • hmm... I'm getting an error that `group` is not found. I suspect this is because the reference dataframes do not contain `group`. – psychcoder Jul 28 '20 at 02:40
  • No...we are right joining it with `DataFrame` which has `group`.So this should work. Are you sure you are using the right name or is that column called something else? – Ronak Shah Jul 28 '20 at 02:43
  • The column names and value names are different but correspond to the toy dataset structure. I'm attempting to debug this by turning everything into characters, but if you have other advice, I'd be all ears :) (I am, in fact, losing all other columns besides type, value, and check.) – psychcoder Jul 28 '20 at 02:51
  • `summarise` will only keep those columns that are in `summarise` and `group_by`. I have updated the answer with the data that I am using based on what you have shared. Verify if the answer works for you using that data. – Ronak Shah Jul 28 '20 at 02:55
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/218724/discussion-between-psychcoder-and-ronak-shah). – psychcoder Jul 28 '20 at 03:00
  • 1
    Nope all good! Reaccepted :) I was able to retain `group` by adapting this line `group_by(group = group.x, type, value = value.y)` – psychcoder Jul 28 '20 at 04:00