Hello and thank you all for looking at my question.
The ultimate goal of this post is to identify my fastest option to input specific distance values, using row and column names that identify the respective spatial location from a small symmetrical data frame (dist.data), into a large symmetrical data frame (final.data) whose row and column names represent the specific observation (There are some observations that are located at the same location which is why the dimensions of the two data frames are different). I am considering sapply, mclapply, and a nested for loop, however, I am open to all suggestions. I would like to find the fastest option.
I got the sapply and nested for loop to work and found that the nested loop was 2X faster. However, I was unsuccessful getting the mclapply to work.
#preliminary set up for reproducible example
set.seed(41)
# final df; used in the nested for loop
final.data<-matrix(NA,nrow=100,ncol=100)
rownames(final.data)<-seq(1:100)
colnames(final.data)<-rownames(final.data)
#make a symetrical 100 X 100 matrix
dist.data <- matrix(rep(0,10000), nrow=100)
dist.data[lower.tri(dist.data)] <- seq(from=1,to=choose(10,2),by=1)
dist.data <- t(dist.data)
dist.data[lower.tri(dist.data)] <- seq(from=1,to=choose(10,2),by=1)
rownames(dist.data)<-seq(1:100)
colnames(dist.data)<-rownames(dist.data)
# spatial id of each person;allows multiples
spat.ID.test<-sample(1:100, 100, replace=TRUE)
using sapply
dummy <- function(row, column){
return(dist.data[spat.ID.test[row],spat.ID.test[column]])
}
ptm <- proc.time()
final.data<-as.data.frame(sapply(1:100,function(row) sapply(1:100, function(column) dummy(row,column))))
proc.time() - ptm
using mclapply
numCores <- detectCores()
dummy <- function(row, column){
return(dist.data[spat.ID.test[row],spat.ID.test[column]])
}
ptm <- proc.time()
final.data<-as.data.frame(mclapply(1:100, function(row) mclapply(1:100, function(column) dummy(row,column),mc.cores = numCores),mc.cores=numCores))
proc.time() - ptm
using a nested for loop
ptm <- proc.time()
for (row in 1:100){
for (column in 1:100){
#270 is the column for spatialID
y1<- spat.ID.test[row] #identifies the spatialID, in df.full, for the row's respective observation (max of 7079 i.e. the # of unique spatialID)
x1<- spat.ID.test[column] #identifies the spatialID for the columns's respective observation
final.data[row,column]=dist.data[y1,x1]
}
}
proc.time() - ptm
Thank you!!
Note: since the output will also be a symmetric matrix it is possible to solve for the lower (upper) triangle and then transpose it to the upper (lower) triangle. To do this I set the upper limit of column to row. However, I am not sure about the best way to transpose it.