1

I want to identify the unmatched values in Vendors data frame for each vendor. In other words, find the countries that are not located in the Vendors data frame for each vendor.

I have a data frame (Vendors) that looks like this:

Vendor_ID Vendor Country_ID Country
1 Burger King 2 USA
1 Burger King 3 France
1 Burger King 5 Brazil
1 Burger King 7 Turkey
2 McDonald's 5 Brazil
2 McDonald's 3 France
Vendors <- data.frame (
Vendor_ID  = c("1", "1", "1", "1", "2", "2"),
      Vendor = c("Burger King", "Burger King", "Burger King", "Burger King", "McDonald's", "McDonald's"),
                  Country_ID = c("2", "3", "5", "7", "5", "3"),
                  Country = c("USA", "France", "Brazil", "Turkey", "Brazil", "France"))

and I have another data frame (Countries) that looks like this:

Country_ID Country
2 USA
3 France
5 Brazil
7 Turkey
Countries <- data.frame (Country_ID = c("2", "3", "5", "7"),
                        Country = c("USA", "France", "Brazil", "Turkey"))

Desired Output:

Vendor_ID Vendor Country_ID Country
2 McDonald's 2 USA
2 McDonald's 7 Turkey

Can someone please tell me how could this be achieved in R? I tried subset & ant-join but the results are not correct.

dario
  • 6,415
  • 2
  • 12
  • 26
  • Hi. If you add a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) you could make it easier for others to find and test an answer to your question. That way you can help others to help you! – dario Feb 11 '21 at 12:34

2 Answers2

2

In Base R we could first split the data by Vendors

VenList <- split(df, df$Vendor)

and then we can check wich country is missing and return it.

res <- lapply(VenList, function(x){
  
  # Identify missing country of vendors
  tmp1 <- df2[!(df2[, "Country"] %in% x[, "Country"]), ]
  
  # get vendor and vendor ID
  tmp2 <- x[1:nrow(tmp1), 1:2]
  
  # cbind
  if(nrow(tmp2) == nrow(tmp1)){
    cbind(tmp2, tmp1)
  }
})

# Which yields

res

# $BurgerKing
# NULL
# 
# $`McDonald's`
#   Vendor_ID     Vendor Country_ID Country
# 5         2 McDonald's          2     USA
# 6         2 McDonald's          7  Turkey

# If you want it as one df you could then flatten to 

do.call(rbind, res)

#              Vendor_ID     Vendor Country_ID Country
# McDonald's.5         2 McDonald's          2     USA
# McDonald's.6         2 McDonald's          7  Turkey

Data

df <- read.table(text = "1  BurgerKing  2   USA
1   BurgerKing  3   France
1   BurgerKing  5   Brazil
1   BurgerKing  7   Turkey
2   McDonald's 5    Brazil
2   McDonald's 3    France", col.names = c("Vendor_ID", "Vendor",   "Country_ID",   "Country"))

df2 <- read.table(text = "2 USA
3   France
5   Brazil
7   Turkey", col.names = c("Country_ID",    "Country")) `
fabla
  • 1,806
  • 1
  • 8
  • 20
1

Solution using expand.grid to create all possible Vendor - Country combinations (assuming that "Countries" has only one entry per country) and then using dplyr to join "Vendors" and find "missing countries"

Edit: The last two lines (left_joins) are only needed to "translate" the ID columns into "text":

library(dplyr)

expand.grid(Vendor_ID=unique(Vendors$Vendor_ID), Country_ID=Countries$Country_ID) %>% 
  left_join(Vendors) %>% 
  filter(is.na(Vendor)) %>%
  select(Vendor_ID, Country_ID) %>% 
  left_join(Countries) %>% 
  left_join(unique(Vendors[, c("Vendor_ID", "Vendor")]))

Returns

  Vendor_ID Country_ID Country     Vendor
1         2          2     USA McDonald's
2         2          7  Turkey McDonald's
dario
  • 6,415
  • 2
  • 12
  • 26