I have a character vector and want to create a matrix with distance metrices for each pair of vector values (using the stringdist
package). Currently, I have an implementation with nested for-loops:
library(stringdist)
strings <- c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic")
m <- matrix(nrow = length(strings), ncol = length(strings))
colnames(m) <- strings
rownames(m) <- strings
for (i in 1:nrow(m)) {
for (j in 1:ncol(m)) {
m[i,j] <- stringdist::stringdist(tolower(rownames(m)[i]), tolower(colnames(m)[j]), method = "lv")
}
}
which results in following matrix:
> m
Hello Helo Hole Apple Ape New Old System Systemic
Hello 0 1 3 4 5 4 4 6 7
Helo 1 0 2 4 4 3 3 6 7
Hole 3 2 0 3 3 4 2 5 7
Apple 4 4 3 0 2 5 4 5 7
Ape 5 4 3 2 0 3 3 5 7
New 4 3 4 5 3 0 3 5 7
Old 4 3 2 4 3 3 0 6 8
System 6 6 5 5 5 5 6 0 2
Systemic 7 7 7 7 7 7 8 2 0
However, if I have - for instance - a vector of lenght 1000 with many non-unique values, this matrix is quite large (let's say, 800 rows by 800 columns) and the loops are very slow. I like to optimize the performace, e.g. by using apply
functions, but I don't know how to translate the above code into an apply
syntax. Can anyone help?