I'm attempting to join two tables, one is a smaller table with a column of names of common food items (e.g. "Corn", "Peppers", "Squash"...etc...), and the other is a larger table with specific food names (e.g. "Sweet Corn", "Red Corn", "Baby Corn", "Zucchini Squash", "Orange Squash", "Squash , Large"...etc...). I'm trying to join based on a "fuzzy" condition in which I specify to join on the food names and pull the most frequent code in another column of the larger table (the mode) into a new column in the smaller table.
Here is an example of the smaller table:
Food Name | Food Code |
---|---|
Corn | NA |
Squash | NA |
Peppers | NA |
Here is an example of the larger table:
Food Name | Food Code |
---|---|
Sweet Corn | 532 |
Red Corn | 532 |
Baby Corns | 944 |
Squash | 111 |
Long Squash | 123 |
Red Pepper | 654 |
Green Pepper | 655 |
Red Peppers | 654 |
I've tried the "left_join" function from the dplyr package, but this doesn't seem to work that well with the "fuzzy" string join feature. I know that the tidyverse also has a function to find the mode of grouped variables and I was hoping to use that function, but I am unsure how to incorporate that into the left_join statement. I also discovered the fuzzyjoin package in R, but I am not certain if this is the best option.
My desired output would look like:
Food Name | Food Code |
---|---|
Corn | 532 |
Squash | 111 |
Peppers | 654 |