Suppose I have the following data frame with car brands. How can I find the centroid of each brand (word) and impute that centroid to the most "similar" words? In order to get a second column, pal_ok with the normalized marks.
db <- data.frame(pal1 = c("fiat","fiat","fiat","fiat 1","fiatt","fait","fiaat","renault","renault","renault","renaultt","renault 3","renaultc","remault"))
pal1
1 fiat
2 fiat
3 fiat
4 fiat 1
5 fiatt
6 fait
7 fiaat
8 renault
9 renault
10 renault
11 renaultt
12 renault 3
13 renaultc
14 remault
db <- data.frame(pal1 = c("fiat","fiat","fiat","fiat 1","fiatt","fait","fiaat","renault","renault","renault","renaultt","renault 3","renaultc","remault"),
pal_ok =c("fiat","fiat","fiat","fiat","fiat","fiat","fiat","renault","renault","renault","renault","renault","renault","renault"))
pal1 pal_ok
1 fiat fiat
2 fiat fiat
3 fiat fiat
4 fiat 1 fiat
5 fiatt fiat
6 fait fiat
7 fiaat fiat
8 renault renault
9 renault renault
10 renault renault
11 renaultt renault
12 renault 3 renault
13 renaultc renault
14 remault renault