Return values not found for each ID - R

Question

I want to identify the unmatched values in Vendors data frame for each vendor. In other words, find the countries that are not located in the Vendors data frame for each vendor.

I have a data frame (Vendors) that looks like this:

Vendor_ID	Vendor	Country_ID	Country
1	Burger King	2	USA
1	Burger King	3	France
1	Burger King	5	Brazil
1	Burger King	7	Turkey
2	McDonald's	5	Brazil
2	McDonald's	3	France

Vendors <- data.frame (
Vendor_ID  = c("1", "1", "1", "1", "2", "2"),
      Vendor = c("Burger King", "Burger King", "Burger King", "Burger King", "McDonald's", "McDonald's"),
                  Country_ID = c("2", "3", "5", "7", "5", "3"),
                  Country = c("USA", "France", "Brazil", "Turkey", "Brazil", "France"))

and I have another data frame (Countries) that looks like this:

Country_ID	Country
2	USA
3	France
5	Brazil
7	Turkey

Countries <- data.frame (Country_ID = c("2", "3", "5", "7"),
                        Country = c("USA", "France", "Brazil", "Turkey"))

Desired Output:

Vendor_ID	Vendor	Country_ID	Country
2	McDonald's	2	USA
2	McDonald's	7	Turkey

Can someone please tell me how could this be achieved in R? I tried subset & ant-join but the results are not correct.

Hi. If you add a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) you could make it easier for others to find and test an answer to your question. That way you can help others to help you! — dario, Feb 11 '21 at 12:34

fabla · Accepted Answer · 2021-02-11T12:49:42.960

In Base R we could first split the data by Vendors

VenList <- split(df, df$Vendor)

and then we can check wich country is missing and return it.

res <- lapply(VenList, function(x){
  
  # Identify missing country of vendors
  tmp1 <- df2[!(df2[, "Country"] %in% x[, "Country"]), ]
  
  # get vendor and vendor ID
  tmp2 <- x[1:nrow(tmp1), 1:2]
  
  # cbind
  if(nrow(tmp2) == nrow(tmp1)){
    cbind(tmp2, tmp1)
  }
})

# Which yields

res

# $BurgerKing
# NULL
# 
# $`McDonald's`
#   Vendor_ID     Vendor Country_ID Country
# 5         2 McDonald's          2     USA
# 6         2 McDonald's          7  Turkey

# If you want it as one df you could then flatten to 

do.call(rbind, res)

#              Vendor_ID     Vendor Country_ID Country
# McDonald's.5         2 McDonald's          2     USA
# McDonald's.6         2 McDonald's          7  Turkey

Data

df <- read.table(text = "1  BurgerKing  2   USA
1   BurgerKing  3   France
1   BurgerKing  5   Brazil
1   BurgerKing  7   Turkey
2   McDonald's 5    Brazil
2   McDonald's 3    France", col.names = c("Vendor_ID", "Vendor",   "Country_ID",   "Country"))

df2 <- read.table(text = "2 USA
3   France
5   Brazil
7   Turkey", col.names = c("Country_ID",    "Country")) `

score 1 · Answer 2 · answered Feb 11 '21 at 13:08

Solution using expand.grid to create all possible Vendor - Country combinations (assuming that "Countries" has only one entry per country) and then using dplyr to join "Vendors" and find "missing countries"

Edit: The last two lines (left_joins) are only needed to "translate" the ID columns into "text":

library(dplyr)

expand.grid(Vendor_ID=unique(Vendors$Vendor_ID), Country_ID=Countries$Country_ID) %>% 
  left_join(Vendors) %>% 
  filter(is.na(Vendor)) %>%
  select(Vendor_ID, Country_ID) %>% 
  left_join(Countries) %>% 
  left_join(unique(Vendors[, c("Vendor_ID", "Vendor")]))

Returns

  Vendor_ID Country_ID Country     Vendor
1         2          2     USA McDonald's
2         2          7  Turkey McDonald's

Return values not found for each ID - R

2 Answers2