I have code that calls ismember(A,B)
some 2^20
times on various gpuArrays A
and B
, where A
is a non-sparse matrix with several million integer entries with sorted rows and B
is a non-sparse sorted vector of a few thousand distinct integer entries. If it helps, with linear indexing A(:)
can be had in sorted form.
For sorted (integer) non-gpu arrays, the fastest option is builtin('_ismemberhelper',a,b)
, ismembc
is slower, both of which are much faster than ismember
(since they omit all the checks), cannot operate with gpuArrays and are still slower than ismember
on gpuArrays. That is, in terms of speed:
ismember on GPU > builtin('_ismemberhelper',a,b) > ismembc() > ismember on CPU
Now, I have looked in the main ismember.m
file to see what code it uses, but all I have been able to find that relates is this:
else %(a,b, are some other class like gpuArray, syb object)
lia = false(size(a));
if nargout <= 1
for i=1:numelA
lia(i) = any(a(i)==b(:)); % ANY returns logical.
end
else
for i=1:numelA
found = a(i)==b(:); % FIND returns indices for LOCB.
if any(found)
lia(i) = true;
found = find(found);
locb(i) = found(1);
end
end
end
end
(Other seemingly relevant parts of the code used functions like unique
and sortrows
, which do not support gpuArrays.) It sure not only does not look right for gpu accelerated code, but it also, as expected, does not even come close to the performance of ismember
for gpuArrays. Thus:
(Question 1) Is the routine for the GPU-accelerated version of ismember
openly accessible (like ismember.m
is)?
(Question 2) More importantly, is there a function/algorithm that would be faster than the GPU-accelerated ismember for my specific case (sorted integer-valued arrays of the aforementioned sizes).
I am currently using MATLAB 2014b and GTX 460 with 1 GB of VRAM.