Vectorization for meshgrid
and ndgrid
If you are still interested in finding a vectorized implementation to make the meshgrid
based code in the problem faster, let me suggest a vectorized method with bsxfun
and it's GPU ported version. I strongly believe that people must look into vectorization with GPUs
as a promising option to speed up MATLAB
codes. Codes that employ meshgrid
or ndgrid
and whose outputs are to be operated with some elementwise operation setup a perfect ground to employ bsxfun
into those codes. To add to that, the use of GPU with bsxfun
, that lets it work on the elements independently with hundreds and thousands of CUDA cores available, makes it just perfect for GPU implementation.
For your specific problem, the inputs were -
x = -2:0.01:2;
y = -2:0.01:2;
Next, you had -
[xx,yy] = meshgrid(x,y);
z = sin(xx.^2-yy.^2);
With bsxfun
, this becomes a one-liner -
z = sin(bsxfun(@minus,x.^2,y.^2.'));
Benchmarking
GPU benchmarking tips were taken from Measure and Improve GPU Performance.
%// Warm up GPU call with insignificant small scalar inputs
temp1 = sin_sqdiff_vect2(0,0);
N_arr = [50 100 200 500 1000 2000 3000]; %// array elements for N (datasize)
timeall = zeros(3,numel(N_arr));
for k = 1:numel(N_arr)
N = N_arr(k);
x = linspace(-20,20,N);
y = linspace(-20,20,N);
f = @() sin_sqdiff_org(x,y);%// Original CPU code
timeall(1,k) = timeit(f);
clear f
f = @() sin_sqdiff_vect1(x,y);%// Vectorized CPU code
timeall(2,k) = timeit(f);
clear f
f = @() sin_sqdiff_vect2(x,y);%// Vectorized GPU(GTX 750Ti) code
timeall(3,k) = gputimeit(f);
clear f
end
%// Display benchmark results
figure,hold on, grid on
plot(N_arr,timeall(1,:),'-b.')
plot(N_arr,timeall(2,:),'-ro')
plot(N_arr,timeall(3,:),'-kx')
legend('Original CPU','Vectorized CPU','Vectorized GPU (GTX 750 Ti)')
xlabel('Datasize (N) ->'),ylabel('Time(sec) ->')
Associated functions
%// Original code
function z = sin_sqdiff_org(x,y)
[xx,yy] = meshgrid(x,y);
z = sin(xx.^2-yy.^2);
return;
%// Vectorized CPU code
function z = sin_sqdiff_vect1(x,y)
z = sin(bsxfun(@minus,x.^2,y.^2.')); %//'
return;
%// Vectorized GPU code
function z = sin_sqdiff_vect2(x,y)
gx = gpuArray(x);
gy = gpuArray(y);
gz = sin(bsxfun(@minus,gx.^2,gy.^2.')); %//'
z = gather(gz);
return;
Results

Conclusions
As the results show, vectorized method with GPU shows good performance improvement which is about 4.3x
against the vectorized CPU code and 6x
against the original code. Please keep in mind that GPU has to overcome a minimum overhead that is required with it's setting up, so at least a decent sized input is needed to see the improvement. Hopefully, people would explore more of vectorization with GPUs
, as it could not be stressed enough!