5

I would like to know how can the bottleneck be treated in the given piece of code.

%% Points is an Nx3 matrix having the coordinates of N points where N ~ 10^6
Z = points(:,3)
listZ = (Z >= a & Z < b); % Bottleneck
np = sum(listZ); % For later usage
slice = points(listZ,:);

Currently for N ~ 10^6, np ~ 1000 and number of calls to this part of code = 1000, the bottleneck statement is taking around 10 seconds in total, which is a big chunk of time compared to the rest of my code.

Profiling Results

Some more screenshots of a sample code for only the indexing statement as requested by @EitanT

Profiling for sample code Profiling for sample code

OrangeRind
  • 4,798
  • 13
  • 45
  • 57
  • 1
    Are you sure it's the bottleneck (can you show the profiling results)? And what is `num_calls` anyway? – Eitan T Jun 20 '13 at 15:16
  • @EitanT Yes I have checked it through the MATLAB profiler itself and this statement is indeed the bottleneck – OrangeRind Jun 20 '13 at 15:19
  • @EitanT I've added the profiling result – OrangeRind Jun 20 '13 at 15:29
  • Thanks. Can you reduce your code to a minimal example that recreates this behaviour? (Including the input data and the loop that invokes this part of code multiple times) – Eitan T Jun 20 '13 at 15:31
  • I could be wrong but doesn't `listZ = (Z >= a & Z < b);` need to be [`&&`](http://www.mathworks.com/help/matlab/matlab_prog/operators.html#f0-39129) instead of just [`&`](http://www.mathworks.com/help/matlab/ref/logicaloperatorselementwise.html)? – James Mertz Jun 20 '13 at 15:46
  • @EitanT There, I check my actual program values, updated them in the question and also posted profiles of programs operating in the same order. – OrangeRind Jun 20 '13 at 15:51
  • @KronoS - No that doesn't work here. – OrangeRind Jun 20 '13 at 15:51
  • @KronoS - `&` is an element-wise `AND` operation, `&&` is the logical AND short-circuit operator. – Dang Khoa Jun 20 '13 at 15:51
  • @DangKhoa Ya I was wrong... However I just tested this on my machine and it completed in [less than a half second](http://i.stack.imgur.com/qSUYJ.png). Are you sure that this is the code that is giving you issues? – James Mertz Jun 20 '13 at 15:56
  • What kind of CPU do you have? And what does `feature accel` give you? – Rody Oldenhuis Jun 20 '13 at 15:56
  • Why do you have this embedded in a for loop? – James Mertz Jun 20 '13 at 15:57
  • @KronoS: Except that you tested only `1e6` elements, while the OP is doing `1e9`...So, you'd take about 44 seconds. – Rody Oldenhuis Jun 20 '13 at 15:57
  • @RodyOldenhuis - I am running this on a Core i7 740QM and feature accel is 'on' already. – OrangeRind Jun 20 '13 at 15:59
  • @KronoS You have generated integers, not doubles, and the looping in the sample code is done to simulate what happens in he real code as shown in the first screenshot - In real the loop runs for different slices possible in a data (I hope the variable names in the first screenshot) are clear enough – OrangeRind Jun 20 '13 at 16:01
  • 2
    Related: [this question](http://stackoverflow.com/questions/12137233/matlab-performance-comparison-slower-than-arithmetic). – Rody Oldenhuis Jun 20 '13 at 16:29

3 Answers3

8

If the equality on one side is not important you can reformulate it to a one-sided comparison and it gets one order of magnitude faster:

Z = rand(1e6,3);
a=0.5; b=0.6;
c=(a+b)/2;
d=abs(a-b)/2;
tic
for k=1:100,
    listZ1 = (Z >= a & Z < b); % Bottleneck
end
toc

tic
for k=1:100,
    listZ2 = (abs(Z-c)<d);
end
toc

isequal(listZ1, listZ2)

returns

Elapsed time is 5.567460 seconds.
Elapsed time is 0.625646 seconds.

ans =

     1
Mohsen Nosratinia
  • 9,844
  • 1
  • 27
  • 52
  • 1
    Ah! That reminds me of [something I asked in the past](http://stackoverflow.com/questions/12137233/matlab-performance-comparison-slower-than-arithmetic). Indeed, I think this is the way to go. – Rody Oldenhuis Jun 20 '13 at 16:30
  • This is good! Although I'm getting a little less than an order of magnitude in the actual program maybe because of the complexity of the rest of the code - but still I think something more could be done. – OrangeRind Jun 20 '13 at 19:21
  • 2
    See also the accepted answer to [this recent C programming question](http://stackoverflow.com/a/17095534/1165522). – horchler Jun 20 '13 at 20:55
3

Assuming the worst case:

  • element-wise & is not short-circuited internally
  • the comparisons are single-threaded

You're doing 2*1e6*1e3 = 2e9 comparisons in ~10 seconds. That's ~200 million comparisons per second (~200 MFLOPS).

Considering you can do some 1.7 GFLops on a single core, this indeed seems rather low.

Are you running Windows 7? If so, have you checked your power settings? You are on a mobile processor, so I expect that by default, there will be some low-power consumption scheme in effect. This allows windows to scale down the processing speed, so...check that.

Other than that....I really have no clue.

Rody Oldenhuis
  • 37,726
  • 7
  • 50
  • 96
  • Nice point to note there - but I've made sure about the power plan - and during computation turbo boost is also kicking in since this is running single threaded. I shall check my CPU's throughput through geekbench and let you know though. – OrangeRind Jun 20 '13 at 16:14
1

Try doing something like this:

for i = 1:1000
    x = (a >= 0.5);
    x = (x < 0.6);
end

I found it to be faster than:

for i = 1:1000
    x = (a >= 0.5 & a < 0.6);
end

by about 4 seconds:

Elapsed time is 0.985001 seconds. (first one)
Elapsed time is 4.888243 seconds. (second one)

I think the reason for your slowing is the element wise & operation.

James Mertz
  • 8,459
  • 11
  • 60
  • 87