I have two datasets.
One is the data I'm currently working on and need to change (spelling mistakes), it looks like this:
df<-structure(list(username = c("hmaens", "pmgcann", "gsamse", "SCundan",
"kflower1", "ahazra"), Department = c("Hematology Oncology2",
"Pediatric Hematology Oncology", "Cancer Institute",
"Hematology Oncology Cancer InstituteClinical Research Center",
"Emergency Medicine Research", "Emergency Medicine Resaerch"),
`Access Control` = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes"
), `Organizational Unit` = structure(c(1L, 1L, 1L, 1L, 2L,
2L), .Label = c("Cancer Institute", "General Research"
), class = "factor"), ManagementGroup = c("Cancer Institute - Hematology Oncology",
"Cancer Institute - Pediatric Hematology Oncology",
"Cancer Institute - Cancer Institooote", "Cancer Institute - HematologyOncology Cancer Institute Clinical Research Center",
"General Research - Emergency Medicine Resaerch", "General Research - EmergencyMedicine Research"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
The other data frame is a reference list of how the Management groups should be spelled:
df2<-structure(list(Department = c("General Research - Cardiology ",
"General Research - Dermatology Clinical Studies Unit ", "General Research - Infectious Diseases ",
"General Research - Clinical Research Center ", "General Research - Nephrology ",
"General Research - Pediatric Endocrinology; Metabolism ", "General Research - Pediatric Hematology\\Oncology ",
"General Research - Radiation Therapy ", "Cancer Institute - Cancer Institute ",
"Cancer Institute - Neurology - LCI ", "Cancer Institute - Neurosurgery - LCI ",
"Cancer Institute - Pediatric Hematology/Oncology-LCI ",
"Cancer Institute - Pediatric Hemophilia/Thrombosis Center - LCI ",
"Cancer Institute - Radiation Therapy - LCI ", "General Research - Cardiology ",
"General Research - Dermatology Clinical Studies Unit ", "General Research - Diagnostic Imaging ",
"General Research - Emergency Medicine Research ", "General Research - Clinical Research Center ",
"General Research - Nephrology ", "General Research - Neurology ",
"Cancer Institute - Hematology/Oncology ", "Cancer Institute - Cancer Institute ",
"Cancer Institute - Neurology - LCI ")), row.names = c(NA,
-24L), class = c("tbl_df", "tbl", "data.frame"))
My question is: is there a way to automatically reference this second data frame for 'correct' spelling of words and change columns en masse?
I realize I could individually fix spelling mistakes with the methods used in this answer, where for instance I could write individual lines of code that say "no, resAErch should instead be spelled 'research'"... but is there a way to have R look for the 'nearest' spelling in that second data frame and change it to that?
Phrased another way, could R code be written that would check df$Managementgroup and notice that "Cancer Institute-Cancer Insitooote" is really similar to "Cancer Institute - Cancer Institute" found in df2$Department and then fix the spelling?
If that makes sense, ideally it would incorporate spelling and spaces from the second dataframe as well.