Here's a simplified version of the code I'm working on:
# Randomly populated matrices as example/toy data.
adata <- sample(1:25,2500,replace=T)
xdim <- 50
ydim <- 50
dim(adata) <- c(xdim,ydim)
# All possible categories
cats <- 1:25
# Retrieve the coordinates of the members of each class
# These subsets are used to calculate the minimum distances.
subA <- list(NULL)
for(i in 1:length(cats)){
subA[[i]] <- which(adata==cats[i],arr.ind=T)
}
# Matrix containing Euclidean distances for each combination of dX and dY
eucdist <- matrix(nrow=xdim,ncol=ydim)
eucdist <- sqrt(((row(eucdist)-1)^2)+((col(eucdist)-1)^2))
# Create an array containing the x,y coordinates of each cell (in pixels). Used to calculate distances
xycoor <- array(dim=c(xdim,ydim,2))
xycoor[,,1] <- row(xycoor[,,1])
xycoor[,,2] <- col(xycoor[,,2])
# For each cell, find the minimum distance to a cell belonging to each category
mindistA <- array(dim=c(xdim,ydim,length(cats)))
for(i in 1:length(cats)){
# Calculates an array of the distance between each cell and the nearest cell of each class (of the same map)
mindistA[,,i] <- apply(xycoor,1:2,function(x,y,z) min(eucdist[abs(sweep(subA[[i]],2,x,"-"))+1],na.rm=T))
}
The aim of this is to:
- create a
list
containing, for each class contained incats
, the row and column numbers of the cells corresponding to that class - for each cell of my matrix, calculate the distance to the closest cell corresponding to each class.
This works well with my toy data, but I'm currently running this with larger datasets (I have matrices of roughly 1000*2000 elements instead of 50*50 as in the example here) and it is very slow and would probably take days to complete. Based on this question, I suspect that trying to access elements of eucdist
using an expression as index is what is slowing down the execution.
A possible solution would be to 1) create an array of dimensions length(subA[[i]])
* 50 * 50 * 2, containing the distances between the X and Y coordinates (respectively) of each grid cell and of each cell corresponding to class i
, and 2) use the contents of this array to look up the desired value in 'eucdist' (possibly using rownames
and colnames
, as suggested in the comments to the question linked above). However, I'm afraid that in my case, this will generate arrays that are too large for R to handle.
Does anyone have a suggestion for speeding this up? Any advice would be greatly appreciated!
Thanks!
PS: I previously tried calculating the euclidean distance directly from within the apply
function using rdist
from the fields
package, but this was also very slow and the reason why I decided to use a lookup table of euclidean distances instead.