I have a optimization problem in Matlab. Assume I get the following three vectors as input:
A
of size(1 x N)
(time-series of signal amplitude)F
of size(1 x N)
(time-series of signal instantaneous frequency)fx
of size(M x 1)
(frequency axis that I want to match the above on)
Now, the elements of F
might not (99% of the times they will not) necessarily match the items of fx
exactly, which is why I have to match to the closest frequency.
Here's the catch: We are talking about big data. N
can easily be up to 2 million, and this has to be run hundred times on several hundred subjects. My two concerns:
- Time (main concern)
- Memory (production will be run on machines with +16GB memory, but development is on a machine with only 8GB of memory)
I have these two working solutions. For the following, N=2604000
and M=201
:
Method 1 (for-loop)
Simple for-loop. Memory is no problem at all, but it is time consuming. Easiest implementation.
tic;
I = zeros(M,N);
for i = 1:N
[~,f] = min(abs(fx-F(i)));
I(f,i) = A(i).^2;
end
toc;
Duration: 18.082 seconds.
Method 2 (vectorized)
The idea is to match the frequency axis with each instantaneous frequency, to get the id.
F
[ 0.9 0.2 2.3 1.4 ] N
[ 0 ][ 0 1 0 0 ]
fx [ 1 ][ 1 0 0 1 ]
[ 2 ][ 0 0 1 0 ]
M
And then multiply each column with the amplitude at that time.
tic;
m_Ff = repmat(F,M,1);
m_fF = repmat(fx,1,N);
[~,idx] = min(abs(m_Ff - m_fF)); clearvars m_Ff m_fF;
m_if = repmat(idx,M,1); clearvars idx;
m_fi = repmat((1:M)',1,N);
I = double(m_if==m_fi); clearvars m_if m_fi;
I = bsxfun(@times,I,A);
toc;
Duration: 64.223 seconds. This is surprising to me, but probably because the huge variable sizes and my limited memory forces it to store the variables as files. I have SSD, though.
The only thing I have not taken advantage of, is that the matrices will have many zero-elements. I will try and look into sparse matrices.
I need at least single
precision for both the amplitudes and frequencies, but really I found that it takes a lot of time to convert from double
to single
.
Any suggestions on how to improve?
UPDATE
As of the suggestions, I am now down to a time of combined 2.53 seconds. This takes advantage of the fact that fx
is monotonically increasing and even-spaced (always starting in 0). Here is the code:
tic; df = mode(diff(fx)); toc; % Find fx step size
tic; idx = round(F./df+1); doc; % Convert to bin ids
tic; I = zeros(M,N); toc; % Pre-allocate output
tic; lin_idx = idx + (0:N-1)*M; toc; % Find indices to insert at
tic; I(lin_idx) = A.^2; toc; % Insert
The timing outputs are the following:
Elapsed time is 0.000935 seconds.
Elapsed time is 0.021878 seconds.
Elapsed time is 0.175729 seconds.
Elapsed time is 0.018815 seconds.
Elapsed time is 2.294869 seconds.
Hence the most time-consuming step is now the very final one. Any advice on this is greatly appreciated. Thanks to @Peter and @Divakar for getting me this far.
UPDATE 2 (Solution)
Wuhuu. Using sparse(i,j,k)
really improves the outcome;
tic; df = fx(2)-fx(1); toc;
tic; idx = round(F./df+1); toc;
tic; I = sparse(idx,1:N,A.^2); toc;
With timings:
Elapsed time is 0.000006 seconds.
Elapsed time is 0.016213 seconds.
Elapsed time is 0.114768 seconds.