0

I am compiling a report where I compare fund IDs to find any discrepancies between (company name, ID) pairings across dataframes. Tables below:

Company ID
A 123
B 456
C 789

and

Company ID
B 453
A 123
C 789

In this case, ideally there is some function that reads each company name, identifies each corresponding ID, and then finds the same company name with corresponding ID from the next table. If they are the same value, I assume it would return TRUE and if different, return FALSE.

In the example above, it would return TRUE, FALSE, TRUE since the only discrepancy is with company B.

I'm thinking it must be some nested variation of sapply but for the life of me I can't figure out what the syntax would be (since the companies will be on different rows).

toastienf
  • 57
  • 4
  • 3
    A near duplicate of [Is there an R function for finding the index of an element in a vector?](https://stackoverflow.com/questions/5577727/is-there-an-r-function-for-finding-the-index-of-an-element-in-a-vector). `paste(df1$company, df1$ID) %in% paste(df2$company, df2$ID)` should do the work. – ekoam Jan 09 '22 at 02:53
  • Your wording is vague. Are you guaranteed to only ever have 2 tables, or do you need to accomodate more? – camille Jan 09 '22 at 20:47

1 Answers1

3

I think a join with a comparison works well here.

library(dplyr)

df_x<-tibble::tribble(
       ~Company,  ~ID,
            "A", 123L,
            "B", 456L,
            "C", 789L
       )

df_y<-tibble::tribble(
       ~Company,  ~ID,
            "B", 453L,
            "A", 123L,
            "C", 789L
       )

df_joined<-df_x %>%
  left_join(df_y, by="Company") %>%
  mutate(discrepancy=(ID.x!=ID.y))

df_joined

#> # A tibble: 3 × 4
#>   Company  ID.x  ID.y discrepancy
#>   <chr>   <int> <int> <lgl>       
#> 1 A         123   123 FALSE       
#> 2 B         456   453 TRUE        
#> 3 C         789   789 FALSE

Created on 2022-01-09 by the reprex package (v2.0.1)

Joe Erinjeri
  • 1,200
  • 1
  • 7
  • 15