0

I have a variable within a dataframe I want to quality control. The variable lists locations (character). I have another dataframe which includes all the alternate names for the same location. I want to get a true/false whether there is a match or not between one variable in my data frame, and all the alternate names in a separate dataframe. Is there a way to do this?

i.e. the variable in my dataframe is called FishingGround that I want to quality control:

FishingGround

Lobster Bay
Deep Cove
Whale Head

Then my other dataframe has all the different possible names for the same location. So I want to create a for loop that goes through each observation in my FishingGround variable and checks whether it matches to one of several listed alternate name.

  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Most likely you would like something like a `semi_join` or `anti_join` from `dplyr`. – MrFlick Feb 28 '20 at 21:03
  • What do you mean by 'alternative names'? Are you simply hoping to scan another data frame for matching instances of the elements in the first data frame? Try to clarify your intentions. – DryLabRebel Feb 28 '20 at 21:33

1 Answers1

1

I like to do this as a look-up table. You can use the acceptable names as the names of entries and just look them up. If the name is not on the list, you will get NA as the result. Example:

FishingGround = c("Lobster Bay", "Deep Cove", "Whale Head")

AcceptableNames =  c("Lobster Bay", "Lobster Claw", 
    "Deep Cove", "Shallow Cove", "Whale Tail")
names(AcceptableNames) = AcceptableNames 

AcceptableNames[FishingGround]
  Lobster Bay     Deep Cove          <NA> 
"Lobster Bay"   "Deep Cove"            NA 

The NAs correspond to unacceptable entries

## Unacceptable names
FishingGround[which(is.na(AcceptableNames[FishingGround]))]
[1] "Whale Head"

## Acceptable names
FishingGround[which(!is.na(AcceptableNames[FishingGround]))]
[1] "Lobster Bay" "Deep Cove"
G5W
  • 36,531
  • 10
  • 47
  • 80