I am trying to apply a function to a very large matrix I want to eventually create a (40,000 by 40,000
) matrix (where only one side of the diagonal is completed) or create a list of the results.
The matrix looks like:
obs 1 obs 2 obs 3 obs 4 obs 5 obs 6 obs 7 obs 8 obs 9
words 1 0.2875775 0.5999890 0.2875775 0.5999890 0.2875775 0.5999890 0.2875775 0.5999890 0.2875775
words 2 0.7883051 0.3328235 0.7883051 0.3328235 0.7883051 0.3328235 0.7883051 0.3328235 0.7883051
words 3 0.4089769 0.4886130 0.4089769 0.4886130 0.4089769 0.4886130 0.4089769 0.4886130 0.4089769
words 4 0.8830174 0.9544738 0.8830174 0.9544738 0.8830174 0.9544738 0.8830174 0.9544738 0.8830174
words 5 0.9404673 0.4829024 0.9404673 0.4829024 0.9404673 0.4829024 0.9404673 0.4829024 0.9404673
words 6 0.0455565 0.8903502 0.0455565 0.8903502 0.0455565 0.8903502 0.0455565 0.8903502 0.0455565
I use the function using cosine(mat[, 3], mat[, 4])
which gives me a single number.
[,1]
[1,] 0.7546113
I can do this for all of the columns but I want to be able to know which columns they came from, i.e. the calculation above came from columns 3
and 4
which is "obs 3"
and "obs 4"
.
Expected output might be the results in a list or a matrix like:
[,1] [,1] [,1]
[1,] 1 . .
[1,] 0.75 1 .
[1,] 0.23 0.87 1
(Where the numbers here are made up)
So the dimensions will be the size of the ncol(mat)
by ncol(mat)
(if I go the matrix method).
Data/Code:
#generate some data
mat <- matrix(data = runif(200), nrow = 100, ncol = 20, dimnames = list(paste("words", 1:100),
paste("obs", 1:20)))
mat
#calculate the following function
library(lsa)
cosine(mat[, 3], mat[, 4])
cosine(mat[, 4], mat[, 5])
cosine(mat[, 5], mat[, 6])
Additional
I thought about doing the following:
- Creating an empty matrix and calculating the function in a forloop but its not working as expected and creating a 40,000 by 40,000
matrix of 0's brings up memory issues.
co <- matrix(0L, nrow = ncol(mat), ncol = ncol(mat), dimnames = list(colnames(mat), colnames(mat)))
co
for (i in 2:ncol(mat)) {
for (j in 1:(i - 1)) {
co[i, j] = cosine(mat[, i], mat[, j])
}
}
co
I also tried putting the results into a list:
List <- list()
for(i in 1:ncol(mat))
{
temp <- List[[i]] <- mat
}
res <- List[1][[1]]
res
Which is also wrong.
So I am trying to create a function which will column by column calculate the function and store the results.