Suppose I have the following data:
example = tibble::tibble(
id = 1:10,
vac = c("FFizer", "sinovasm", "aztraseneca", "phiser", "sonovac",
"faizer", "sinivasc", "astraseneca", "sinocav", "aztraxeneca")
)
Which looks like this:
# A tibble: 10 x 2
id vac
<int> <chr>
1 1 FFizer
2 2 sinovasm
3 3 aztraseneca
4 4 phiser
5 5 sonovac
6 6 faizer
7 7 sinivasc
8 8 astraseneca
9 9 sinocav
10 10 aztraxeneca
And I want to find if the variable lab
matchs in some degree with any option from a vector.
Say the vector to use as identifier is:
labs = c("sinovac", "pfizer", "astrazeneca")
Crossing example
data.frame with the vector labs
should give some output like this:
correction = tibble::tibble(
id = 1:10,
vac = c("FFizer", "sinovasm", "aztraseneca", "phiser", "sonovac",
"faizer", "sinivasc", "astraseneca", "sinocav", "aztraxeneca"),
match = c("pfizer", "sinovac", "astrazeneca", "pfizer", "sinovac",
"pfizer", "sinovac", "astrazeneca", "sinovac", "astrazeneca")
)
Looking like this:
# A tibble: 10 x 3
id vac match
<int> <chr> <chr>
1 1 FFizer pfizer
2 2 sinovasm sinovac
3 3 aztraseneca astrazeneca
4 4 phiser pfizer
5 5 sonovac sinovac
6 6 faizer pfizer
7 7 sinivasc sinovac
8 8 astraseneca astrazeneca
9 9 sinocav sinovac
10 10 aztraxeneca astrazeneca
The main idea is to find a way of having a homogeneous vac
variable
In addition to this, I'd like to create a variable which indicated the "matching degree". I mean, if the string is "FFizer", then its match would be "pfizer" and their matching degree would be around 0.66