Optimizing algorithm calculating (sin(x)-x)*x^{-3} (in matlab)

Question

My task is to write optimal program that calculates matrix Y, given matrix X, where:

y = (sin(x)-x) x^-3

Here's the code I have written so far:

n = size(X, 1);
m = size(X, 2);
Y = zeros(n, m);
d = n*m; 

for i = 1:d
    x = X(i);
    if abs(x)<0.1
        Y(i) = -1/6+x.^2/120-x.^4/5040+x.^6/362880;
    else
        Y(i) = (sin(x)-x).*(x.^(-3));
    end
end

So, generally the formula was inaccurate around 0, so I have approximated it using Taylor theorem.

Unfortunately this program has accuracy of 91% and efficiency of only 24% (so it's 4 times slower than the optimal solution).

The tests are around 13 million samples, out of which around 6 million have value of less than 0.1. The range of samples is (-8π , 8π).

The target accuracy (100%) is 4*epsilon where epsilon equals 2^(-52) (that means that numbers calculated by program shouldn't be larger or smaller than numbers calculated "perfectly" than 4*epsilon).

100*epsilon means accuracy of 86%.

Do you have any ideas on how to make it faster and more accurate? I'm looking both for mathematical tricks on how to further transform given formula, and general MATLAB tips that can accelerate programs?

EDIT: Using Horner method, I have managed to bring up efficiency up to 81% (accuracy still 91%) with this program:

function Y = main(X)

Y = (sin(X)-X).*(X.^(-3));
i = abs(X) < 0.1;
Y(i) = horner(X(i));

function y = horner (x)

pow = x.*x;
y = -1/6+pow.*(1/120+pow.*(-1/5040+pow./362880));

Do you have any further ideas on how to improve it?

Your question is not entirely clear. What do you meanby optimizing algorithm? Do you want to find the optimal (minimal/maximum value) of the function, as in [fminsearch](https://www.mathworks.com/help/matlab/ref/fminsearch.html)? Or you want to calculate the value of the function? In that case, the for-loop is unecessary. The statement: `Y = (sin(X)-X)./X^3` should work for the entire array, without the necessity to iterate through every member of the array. — Thales, Apr 22 '20 at 16:17
@Thales by optimizing I meant to make it faster and more accurate. Calculating simply Y = (sin(X)-X)./X^3 is wrong, because the accuracy would be higly lowered around 0 (and half of the samples are going to be near 0) — MartinYakuza, Apr 22 '20 at 16:21
Could you give a numerical example of what you are asking? What is the expected range for the `x` variable? When it fails to accurately calculate? — Thales, Apr 22 '20 at 16:24
what you mean by 'Unfortunately this program has accuracy of 91% and efficiency of only 24% (so it's 4 times slower than the optimal solution).' — Thales, Apr 22 '20 at 16:29
@Thales what do you mean "numerical example"? I am supposed to calculate the given formula as fast and as precisely as possible. That's all. — MartinYakuza, Apr 22 '20 at 16:30
@Thales my proffessor gave us program (precompiled - we can't see the code) that is supposed to be "perfect solution". The values of accuraccy and efficancy are in comparsion to that program. — MartinYakuza, Apr 22 '20 at 16:32
I added an answer about the questions I asked. you have a function that is difficult to accurately calculate in a digital computer due to round-off errors and the [arithmetic of floating point numbers](https://en.wikipedia.org/wiki/IEEE_754). you have a taylor series expansion valid for the vicinity of the point x=0, which is a good approximation. You can either use the function or its aproximation. What else you want to calculate? — Thales, Apr 22 '20 at 16:44
@CrisLuengo why would I? I only need it when the outcome of substraction sin(x)-x is close to 0, not when sin(x) is close to 0 — MartinYakuza, Apr 22 '20 at 17:08
@Thales yes I know. My program is 4 times slower than it should be. Making it faster is the main reason of this question. — MartinYakuza, Apr 22 '20 at 17:11

score 2 · Answer 1 · answered Apr 22 '20 at 16:39

2

Program seems to work fine for a great range of input:

x = linspace(-8*pi,8*pi,13e6); % 13 million samples in the desired range
y = (sin(x)-x)./x.^3;
plot(x,y)

Due due round-off errors, you may have problem calculating it for very small values of x:

x = 0
y = (sin(x)-x)./x.^3

y =

   NaN

You already have the Taylor series expansion of the function around 0. As the Taylor expansion does not include a division by x, you can expect a better behaviour of the Taylor function around this region:

x = -1e-6:1e-9:1e-6;
y = (sin(x)-x)./x.^3;
y_taylor = -1/6 + x.^2/120 - x.^4/5040 + x.^6/362880;
plot(x,y,x,y_taylor); legend('y','taylor expansion','location','best')

answered Apr 22 '20 at 16:39

Thales

1,181
9
10

sooo - you have just drawn what I have been talking about? That doesn't answer my question at all. I need to make my program faster. – MartinYakuza Apr 22 '20 at 16:44
Did you checked my code? It has no for loops. It runs faster. – Thales Apr 22 '20 at 16:45
you can't calculate all samples with Taylor expansion - it is accurate only for numbers around 0, but it's inaccurate for larger numbers. – MartinYakuza Apr 22 '20 at 16:51
@MartinYakuza: This answer seems to indicate that you should only use the Taylor expansion when you're within 0.5e-6 of 0, not within 0.1. Where does your 0.1 cutoff come from? – Cris Luengo Apr 22 '20 at 17:07
@Thales using your range I get efficency of 28% (slightly higher), but accuracy of only 26% (much, much lower). I got number 0.1 by trials and errors - it gave me best accruacy and efficency combination. – MartinYakuza Apr 22 '20 at 17:20

Cris Luengo · Accepted Answer · 2020-04-22T20:49:40.687

1

You can replace your loop with vectorized code. This is usually more efficient than loop because the loop has a conditional in it, which is bad for branch prediction:

Y = (sin(X)-X).*(X.^(-3));
i = abs(X) < 0.1;
Y(i) = -1/6+X(i).^2/120-X(i).^4/5040+X(i).^6/362880;

Rewriting the primary equation to avoid the cubic root yields a 3x speedup for that computation:

Y = (sin(X)./X - 1) ./ (X.*X);

Speed comparison:

The following script compares timing for this method compared to OP's loop code. I use data that has 7 million values uniformly distributed in (-8π, 8π), and another 6 million values uniformly distributed in (-0.1,0.1).

OP's loop code takes 2.4412 s, and the vectorized solution takes 0.7224 s. Using OP's Horner method and the rewritten sin expression it takes 0.1437 s.

X = [linspace(-8*pi,8*pi,7e6), linspace(-0.1,0.1,6e6)];
timeit(@()method1(X))
timeit(@()method2(X))

function Y = method1(X)
n = size(X, 1);
m = size(X, 2);
Y = zeros(n, m);
d = n*m; 

for i = 1:d
    x = X(i);
    if abs(x)<0.1
        Y(i) = -1/6+x.^2/120-x.^4/5040+x.^6/362880;
    else
        Y(i) = (sin(x)-x).*(x.^(-3));
    end
end
end

function Y = method2(X)
Y = (sin(X)-X).*(X.^(-3));
i = abs(X) < 0.1;
Y(i) = -1/6+X(i).^2/120-X(i).^4/5040+X(i).^6/362880;
end

function Y = method3(X)
Y = (sin(X)./X - 1) ./ (X.*X);
i = abs(X) < 0.1;
Y(i) = horner(X(i));
end

function y = horner (x)
pow = x.*x;
y = -1/6+pow.*(1/120+pow.*(-1/5040+pow./362880));
end

edited Apr 22 '20 at 20:49

answered Apr 22 '20 at 17:12

Cris Luengo

55,762
10
62
120

as I mentioned in my question, almost half of the samples are near 0, so your answer is invalid – MartinYakuza Apr 22 '20 at 17:15
as I commented above, reducing Taylor range usage gave me only small boost to efficency, but a major lose to accuracy. By trials and errors I found the number 0.1 to work best. – MartinYakuza Apr 22 '20 at 17:22
ok it appears that indeed it works faster, but it's still only 40% (so more than 2 times slower than the model program). The accuracy obviously is still 91% – MartinYakuza Apr 22 '20 at 17:35
decrasing the range of Taylor series usage rises the speed only slightly, while greatly reducing accuracy – MartinYakuza Apr 22 '20 at 17:37
it's still more than two times slower than it should be. Can you think of other ways of improving the speed? I think that maybe it's possible to transform the formula so that it is calculated quicker? – MartinYakuza Apr 22 '20 at 17:46
oh I think calculating the Taylor expansion using Horner method should be faster. – MartinYakuza Apr 22 '20 at 17:47
I have edited my question, adding what I have achieved using Horner method. Do you have any further insights? – MartinYakuza Apr 22 '20 at 20:15
@MartinYakuza: Were you told to reproduce the speed and accuracy of the compiled program in MATLAB, or is MATLAB your own choice? I think getting to 80% of the speed of a compiled program is pretty good. I C and C++ you also have access to `long double` format (10-bytes), which you can use to improve accuracy significantly. The only thing I can think of to further improve the speed of computation is not to compute the `sin` version for `~i`: `j=~i; Y(j) = (sin(X(j)-X(j)).*(X(j).^(-3));`. And not using the division, for example replacing `./362880` by `.*2.755731922398589e-06`. – Cris Luengo Apr 22 '20 at 20:33
@MartinYakuza: Rewriting the `sin` expression gives another good speed advantage (see edited answer). The ideas in my previous comment didn't. – Cris Luengo Apr 22 '20 at 20:49
yes rewriting sin expression raised efficency up to 130%! But unfortunately it slighly decrases the accuracy - it is 88%, which is too low (my program need to have at least 90% accuracy to be accepted) – MartinYakuza Apr 22 '20 at 21:39

Optimizing algorithm calculating (sin(x)-x)*x^{-3} (in matlab)

2 Answers2