I have large sets of 3D data consisting of 1D signals acquired in 2D space. The first step in processing this data is thresholding all signals to find the arrival of a high-amplitude pulse. This pulse is present in all signals and arrives at different times. After thresholding, the 3D data set should be reordered so that every signal starts at the arrival of the pulse and what came before is thrown away (the end of the signals is of no importance, as of now i concatenate zeros to the end of all signals so the data remains the same size).
Now, I have implemented this in the following manner:
First, i start by calculating the sample number of the first sample exceeding the threshold in all signals
M = randn(1000,500,500); % example matrix of realistic size
threshold = 0.25*max(M(:,1,1)); % 25% of the maximum in the first signal as threshold
[~,index] = max(M>threshold); % indices of first sample exceeding threshold in all signals
Next, I want all signals to be shifted so that they all start with the pulse. For now, I have implemented it this way:
outM = zeros(size(M)); % preallocation for speed
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
end
end
This works fine, and i know for-loops are not that slow anymore, but this easily takes a few seconds for the datasets on my machine. A single iteration of the for-loop takes about 0.05-0.1 sec, which seems slow to me for just copying a vector containing 500-2000 double values.
Therefore, I have looked into the best way to tackle this, but for now I haven't found anything better. I have tried several things: 3D masks, linear indexing, and parallel loops (parfor).
for 3D masks, I checked to see if any improvements are possible. Therefore i first contruct a logical mask, and then compare the speed of the logical mask indexing/copying to the double nested for loop.
%% set up for logical mask copying
AA = logical(ones(500,1)); % only copy the first 500 values after the threshold value
Mask = logical(zeros(size(M)));
Jepla = zeros(500,size(M,2),size(M,3));
for i = 1:size(M,2)
for j = 1:size(M,3)
Mask(index(1,i,j):index(1,i,j)+499,i,j) = AA;
end
end
%% speed comparison
tic
Jepla = M(Mask);
toc
tic
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
end
end
toc
The for-loop is faster every time, even though there is more that's copied.
Next, linear indexing.
%% setup for linear index copying
%put all indices in 1 long column
LongIndex = reshape(index,numel(index),1);
% convert to linear indices and store in new variable
linearIndices = sub2ind(size(M),LongIndex,repmat(1:size(M,2),1,size(M,3))',repelem(1:size(M,3),size(M,2))');
% extend linear indices with those of all values to copy
k = zeros(numel(M),1);
count = 1;
for i = 1:numel(LongIndex)
values = linearIndices(i):size(M,1)*i;
k(count:count+length(values)-1) = values;
count = count + length(values);
end
k = k(1:count-1);
% get linear indices of locations in new matrix
l = zeros(length(k),1);
count = 1;
for i = 1:numel(LongIndex)
values = repelem(LongIndex(i)-1,size(M,1)-LongIndex(i)+1);
l(count:count+length(values)-1) = values;
count = count + length(values);
end
l = k-l;
% create new matrix
outM = zeros(size(M));
%% speed comparison
tic
outM(l) = M(k);
toc
tic
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
end
end
toc
Again, the alternative approach, linear indexing, is (a lot) slower. After this failed, I learned about parallelisation, and though this would for sure speed up my code. By reading some of the documentation around parfor and trying it out a bit, I changed my code to the following:
gcp;
outM = zeros(size(M));
inM = mat2cell(M,size(M,1),ones(size(M,2),1),size(M,3));
tic
parfor i = 1:500
for j = 1:500
outM(:,i,j) = [inM{i}(index(1,i,j):end,1,j);zeros(index(1,i,j)-1,1)];
end
end
end
toc
I changed it so that "outM" and "inM" would both be sliced variables, as I read this is best. Still this is very slow, a lot slower than the original for loop.
So now the question, should I give up on trying to improve the speed of this operation? Or is there another way in which to do this? I have searched a lot, and for now do not see how to speed this up. Sorry for the long question, but I wanted to show what I tried. Thank you in advance!