1

I wrote a heap sort function in MATLAB and it works fine, except that when the length of input is greater or equal to 1000, it can take a long time (e.g. the length of 1000 takes half a second). I'm not sure if it's that MATLAB doesn't run very fast on heap sort algorithm or it's just my code needs to be improved. My code is shown below:

function b = heapsort(a)

[~,n] = size(a);
b = zeros(1,n);
for i = 1:n
    a = build_max_heap(a);
    b(n+1-i) = a(1);

    temp = a(1);
    a(1) = a(n+1-i);
    a(n+1-i) = temp;

    a(n+1-i) = [];
    a = heapify(a,1);
end
end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function a = build_max_heap(a)
[~,n] = size(a);
m = floor(n/2);
for i = m:-1:1
    a = heapify(a,i);
end
end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function a = heapify(a,i)
[~,n] = size(a);

left = 2*i;
right = 2*i + 1;

if left <= n 
    if a(left) >= a(i)
        large = left;
    else
        large = i;
    end
else
    return
end
if right <= n
    if a(right) >= a(large)
        large = right;
    end
end

if large ~= i
    temp = a(large);
    a(large) = a(i);
    a(i) = temp;
    a = heapify(a,large);
end
end

I'm aware that maybe it's the code a(n+1-i) = []; that may consume a lot of time. But when I changed the [] into -999 (lower than any number of the input vector), it doesn't help but took even more time.

Adriaan
  • 17,741
  • 7
  • 42
  • 75
  • Try using the [MATLAB profiler](http://uk.mathworks.com/help/matlab/ref/profile.html) ? – Paul R Feb 08 '16 at 07:06
  • I am not entirely sure but it seems as if you assume that values are passed by reference. That is unfortunately not the case for Matlab :(. Matlab uses copy-on-write, which means it will do a copy when you modify the input. You may be able to confirm this by using a sorted vector and a random vector. In case the result is much worse than worse case big O, this may likely be the problem. Try for different size vectors. Apart from that, the sort function in Matlab is said to be fast. – patrik Feb 08 '16 at 07:27
  • I can't back this up with references, but a Matlab expert friend of mine told me to never call functions within large loops, but rather "post" the functions directly in the loops themselves, due some overhead when calling the functions repeatedly in the loop. So one test could be to include both `build_max_heap` and `heapify` (simply for testing this hypothesis). Possibly no longer an issue depending on which Matlab version you're using. – dfrib Feb 08 '16 at 07:28
  • @dfri I'm using MATLAB R2015b and other sort functions written by me is perfectly fine. So it likely not the case. – BurgerBurglar Feb 08 '16 at 08:16
  • @patrik I do know MATLAB functions pass arguments by value, not reference, so I wrote my function like `a = somefunction (a)`. Still no clue why it works so slowly. – BurgerBurglar Feb 08 '16 at 08:18
  • 1
    @PaulR Profiler exactly solved my problem. It turns out that `[~,n] = size(a);` takes much more time than `n = length(a);` I would never think of that without your help. Thanks a ton. – BurgerBurglar Feb 08 '16 at 08:36
  • @BurgerBurglar Interesting, will Matlab be able to optimize `a = somefunction (a)`? I was not aware of that so thank you! Apart from this, copy-on-write is not exactly pass-by value. Still I guess you get the point. – patrik Feb 10 '16 at 07:41

1 Answers1

4

You should use the profiler to check which lines that takes the most time. It's definitely a(n+1-i) = []; that's slowing down your function.

Resizing arrays in loops is very slow, so you should always try to avoid it.

A simple test:

  • Create a function that takes a large vector as input, and iteratively removes elements until it's empty.
  • Create a function that takes the same vector as input and iteratively sets each value to 0, Inf, NaN or something else.

Use timeit to check which function is faster. You'll see that the last function is approximately 100 times faster (depending on the size of the vector of course).

The reason why -999 takes more time is most likely because a no longer gets smaller and smaller, thus a = heapify(a,1); won't get faster and faster. I haven't tested it, but if you try the following in your first function you'll probably get a much faster program (you must insert the n+1-i) other places in your code as well, but I'll leave that to you):

a(n+1-ii) = NaN;
a(1:n+1-ii) = heapify(a(1:n+1-ii),1);

Note that I changed i to ii. That's partially because I want to give you a good advice, and partially to avoid being reminded to not use i and j as variables in MATLAB.

Community
  • 1
  • 1
Stewie Griffin
  • 14,889
  • 11
  • 39
  • 70
  • I agree with your insight, OP however claims that if he does not remove the element but sets it to `-999` it is even slower than removing it. – Adriaan Feb 08 '16 at 07:15
  • 1
    @Adriaan. true. And the reason is most likely because `a` gets smaller and smaller, thus `a = heapify(a,1);` takes shorter and shorter time. – Stewie Griffin Feb 08 '16 at 07:19
  • Thank you very much for your advice, but `a(n+1-i) = NaN;` still works slower than `a(n+1-i) = [];`, but faster than `a(n+1-i) = -999;` though. I haven't learnt the profile function so I may comment on that later. – BurgerBurglar Feb 08 '16 at 08:01
  • Sorry I'm bit of a newbie in this site and I'm not familiar with how to include my code in the comment. I deleted that comment and changed `i` into `ii`. Still no luck :-( – BurgerBurglar Feb 08 '16 at 08:12
  • I learnt how to use MATLAB Profiler just now and it turns out that `[~,n] = size(a);` takes much more time than `n = length(a);` It's a rookie mistake. Thank you very much for your time and patience. – BurgerBurglar Feb 08 '16 at 08:38