Fast Hypotenuse Algorithm for Embedded Processor?

Question

Is there a clever/efficient algorithm for determining the hypotenuse of an angle (i.e. sqrt(a² + b²)), using fixed point math on an embedded processor without hardware multiply?

Can you avoid the `sqrt`? E.g. only compare vs `lenSquared` vs `len`? A lot's going to depend on your processor. Can you tell us what it is? — leander, Aug 17 '10 at 20:01
In this case, the sqrt is necessary. The application involves manipulating data from an accelerometer and running it through a non-linear filter. The processor in question has an 8-bit RISC instruction set, the Atmel ATTiny44A (datasheet: http://www.atmel.com/dyn/resources/prod_documents/doc8183.pdf). — Tim, Aug 17 '10 at 20:11
By the way, anyone interested in embedded processing should consider joining the [Electronics and Robotics](http://area51.stackexchange.com/proposals/2651/electronics-and-robotics) stackexchange. — Kevin Vermeer, Aug 17 '10 at 20:46
There are some serious terminology issues here. The hypotenuse is the *line segment* opposite the right angle of a right triangle. You want the *length* of the hypotenuse. Moreover, you may (or may not) actually want the inverse rather than the Pythagorean expression. — dmckee --- ex-moderator kitten, Aug 17 '10 at 21:41
@dmckee: He wants the absolute length of vector (a,b), or possibly (a,b,c) — Nick T, Aug 18 '10 at 14:09
this is rough but works for me: `h=(min(a,b)>>1)+max(a,b)` where a&b are abs(a) and abs(b) respectively — EkriirkE, Jun 04 '14 at 20:30

Matthew Slattery · Answer 1 · 2010-08-18T20:57:56.890

If the result doesn't have to be particularly accurate, you can get a crude approximation quite simply:

Take absolute values of a and b, and swap if necessary so that you have a <= b. Then:

h = ((sqrt(2) - 1) * a) + b

To see intuitively how this works, consider the way that a shallow angled line is plotted on a pixel display (e.g. using Bresenham's algorithm). It looks something like this:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | | | | | | | | | | | | | | |*|*|*|    ^
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+    |
| | | | | | | | | | | | |*|*|*|*| | | |    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+    |
| | | | | | | | |*|*|*|*| | | | | | | | a pixels
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+    |
| | | | |*|*|*|*| | | | | | | | | | | |    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+    |
|*|*|*|*| | | | | | | | | | | | | | | |    v
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 <-------------- b pixels ----------->

For each step in the b direction, the next pixel to be plotted is either immediately to the right, or one pixel up and to the right.

The ideal line from one end to the other can be approximated by the path which joins the centre of each pixel to the centre of the adjacent one. This is a series of a segments of length sqrt(2), and b-a segments of length 1 (taking a pixel to be the unit of measurement). Hence the above formula.

This clearly gives an accurate answer for a == 0 and a == b; but gives an over-estimate for values in between.

The error depends on the ratio b/a; the maximum error occurs when b = (1 + sqrt(2)) * a and turns out to be 2/sqrt(2+sqrt(2)), or about 8.24% over the true value. That's not great, but if it's good enough for your application, this method has the advantage of being simple and fast. (The multiplication by a constant can be written as a sequence of shifts and adds.)

Nice; but you should probably mention to take the absolute value of both a and b as well. +1 for some brief error analysis — Nick T, Aug 18 '10 at 14:16
The formula above gives a octagonal profile, which works for me. To break it further and stick to integers (no floats!), taking sqr(2) as 0.414213562, bump up the decimal&round: 4142 and dividing it back down later (/10000) works quite fast! many times more without the float constant `h=4142*abs(a)/10000+abs(b)` Thanks! ** That ratio is awful close to 1:2, so using just bit shifting and mul+add I get a very good facsimile `h=(a>>1)+b` — EkriirkE, Jun 04 '14 at 08:33
Nice idea. The error can be up to 10%, though. (I'm getting an error of 19 for a length of 231, for example. The error can also be under 1% for some values, but it's generally 5-7%.) — Brent Faust, Aug 18 '14 at 23:46
@MatthewSlattery could you provide a reference to this method? I know it's been a while but I desperately need to cite this. — yildirimyigit, Aug 08 '15 at 13:59

Edgar Bonet · Answer 2 · 2017-05-25T19:40:06.787

For the record, here are a few more approximations, listed in roughly increasing order of complexity and accuracy. All these assume 0 ≤ a ≤ b.

h = b + 0.337 * a // max error ≈ 5.5 %
h = max(b, 0.918 * (b + (a>>1))) // max error ≈ 2.6 %
h = b + 0.428 * a * a / b // max error ≈ 1.04 %

Edit: to answer Ecir Hana's question, here is how I derived these approximations.

First step. Approximating a function of two variables can be a complex problem. Thus I first transformed this into the problem of approximating a function of one variable. This can be done by choosing the longest side as a “scale” factor, as follows:

h = √(b² + a²)
= b √(1 + (a/b)²)
= b f(a/b) where f(x) = √(1+x²)

Adding the constraint 0 ≤ a ≤ b means we are only concerned with approximating f(x) in the interval [0, 1].

Below is the plot of f(x) in the relevant interval, together with the approximation given by Matthew Slattery (namely (√2−1)x + 1).

Function to approximate

Second step. Next step is to stare at this plot, while asking yourself the question “how can I approximate this function cheaply?”. Since the curve looks roughly parabolic, my first idea was to use a quadratic function (third approximation). But since this is still relatively expensive, I also looked at linear and piecewise linear approximations. Here are my three solutions:

Three approximations

The numerical constants (0.337, 0.918 and 0.428) were initially free parameters. The particular values were chosen in order to minimize the maximum absolute error of the approximations. The minimization could certainly be done by some algorithm, but I just did it “by hand”, plotting the absolute error and tuning the constant until it is minimized. In practice this works quite fast. Writing the code to automate this would have taken longer.

Third step is to come back to the initial problem of approximating a function of two variables:

h ≈ b (1 + 0.337 (a/b)) = b + 0.337 a
h ≈ b max(1, 0.918 (1 + (a/b)/2)) = max(b, 0.918 (b + a/2))
h ≈ b (1 + 0.428 (a/b)²) = b + 0.428 a²/b

Please, can you explain how the constants/approximations were derived? — Ecir Hana, May 25 '17 at 15:11

score 7 · Answer 3 · answered Aug 17 '10 at 20:11

7

Consider using CORDIC methods. Dr. Dobb's has an article and associated library source here. Square-root, multiply and divide are dealt with at the end of the article.

answered Aug 17 '10 at 20:11

Clifford

88,407
13
85
165

4

Note: I have used this library and found an error in the log() function. This is corrected by adding a `0x0LL` to the end of the `log_two_power_n_reversed[]` array initialiser. I have confirmed this correction with the author. – Clifford Oct 20 '10 at 09:19

Jerry Coffin · Answer 4 · 2020-12-30T00:04:35.653

One possibility looks like this:

#include <math.h>

/* Iterations   Accuracy
 *  2          6.5 digits
 *  3           20 digits
 *  4           62 digits
 * assuming a numeric type able to maintain that degree of accuracy in
 * the individual operations.
 */
#define ITER 3

double dist(double P, double Q) {
/* A reasonably robust method of calculating `sqrt(P*P + Q*Q)'
 *
 * Transliterated from _More Programming Pearls, Confessions of a Coder_
 * by Jon Bentley, pg. 156.
 */

    double R;
    int i;

    P = fabs(P);
    Q = fabs(Q);

    if (P<Q) {
        R = P;
        P = Q;
        Q = R;
    }

/* The book has this as:
 *  if P = 0.0 return Q; # in AWK
 * However, this makes no sense to me - we've just insured that P>=Q, so
 * P==0 only if Q==0;  OTOH, if Q==0, then distance == P...
 */
    if ( Q == 0.0 )
        return P;

    for (i=0;i<ITER;i++) {
        R = Q / P;
        R = R * R;
        R = R / (4.0 + R);
        P = P + 2.0 * R * P;
        Q = Q * R;
    }
    return P;
}

This still does a couple of divides and four multiples per iteration, but you rarely need more than three iterations (and two is often adequate) per input. At least with most processors I've seen, that'll generally be faster than the sqrt would be on its own.

For the moment it's written for doubles, but assuming you've implemented the basic operations, converting it to work with fixed point shouldn't be terribly difficult.

Some doubts have been raised by the comment about "reasonably robust". At least as originally written, this was basically a rather backhanded way of saying that "it may not be perfect, but it's still at least quite a bit better than a direct implementation of the Pythagorean theorem."

In particular, when you square each input, you need roughly twice as many bits to represent the squared result as you did to represent the input value. After you add (which needs only one extra bit) you take the square root, which gets you back to needing roughly the same number of bits as the inputs. Unless you have a type with substantially greater precision than the inputs, it's easy for this to produce really poor results.

This algorithm doesn't square either input directly. It is still possible for an intermediate result to underflow, but it's designed so that when it does so, the result still comes out as well as the format in use supports. Basically, the situation in which it happens is that you have an extremely acute triangle (e.g., something like 90 degrees, 0.000001 degrees, and 89.99999 degrees). If it's close enough to 90, 0, 90, we may not be able to represent the difference between the two longer sides, so it'll compute the hypotenuse as being the same length as the other long side.

By contrast, when the Pythagorean theorem fails, the result will often be a NaN (i.e., tells us nothing) or, depending on the floating point format in use, quite possibly something that looks like a reasonable answer, but is actually wildly incorrect.

What does "*reasonably robust*" mean? Does that mean that it does not work in all cases!? — Clifford, Aug 17 '10 at 20:37
There's no way that math.h and a double type is going to fit on an ATTiny. You get 4k of program space, maximum, and both division and multiplication are going to be in software. However, this would work well for a processor with a hardware multiply (or divide) instruction. — Kevin Vermeer, Aug 17 '10 at 20:38
@reemevnivek: yes, but trying to present it for an unknown fixed-point format is next to impossible. `` is being used only for `fabs`, so its inclusion (by itself) would mean next to nothing (including a header normally only declares things; only what you *use* goes in the executable). Ultimately you're right though: as the final sentence makes clear, I certainly wouldn't expect to use this as-is on a micro-controller. — Jerry Coffin, Aug 17 '10 at 20:53
Agreed; I merely thought that "reasonably robust" without qualification or quantification was a bit weak as a comment, and would require analysis on the part of the user to determine the consequences of using this code. It may have meant that it would fail in non-deterministic ways rather than just for a deterministic range of inputs, or it may simply refer to its precision in all cases rather than its correctness in some cases. The comment alone would ring alarm bells for example if this were a safety-critical application. — Clifford, Aug 18 '10 at 09:42
@JerryCoffin wow, 10 years on, you ressurect this? I think editing the answer to quantify what you meant was what I was after - 10 years ago. Not for my own benefit; I could work that out for myself. Rather it was a suggestion to improve an already good answer, and remove any doubt that the statement might instill in any novice. Comments don't really suffice for that. — Clifford, Dec 29 '20 at 23:22
@Clifford: Fair enough--I've done a bit of editing to try to explain the situation better. — Jerry Coffin, Dec 30 '20 at 00:01

score 5 · Answer 5 · answered Aug 17 '10 at 20:02

5

You can start by reevaluating if you need the sqrt at all. Many times you are calculating the hypotenuse just to compare it to another value - if you square the value you're comparing against you can eliminate the square root altogether.

answered Aug 17 '10 at 20:02

Mark Ransom

299,747
42
398
622

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

4

Unless you're doing this at >1kHz, multiply even on a MCU without hardware MUL isn't terrible. What's much worse is the sqrt. I would try to modify my application so it doesn't need to calculate it at all.

Standard libraries would probably be best if you actually need it, but you could look at using Newton's method as a possible alternative. It would require several multiply/divide cycles to perform, however.

AVR resources

Atmel App note AVR200: Multiply and Divide Routines (pdf)
This sqrt function on AVR Freaks forum
Another AVR Freaks post

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 17 '10 at 20:03

Nick T

25,754
12
83
121

1

You don't need to divide to approximate a square root. You can easily just use a binary-search type algorithm that only does one multiply per bit and no divides. – Gabe Aug 17 '10 at 20:32

score 1 · Answer 7 · answered Aug 17 '10 at 20:36

1

Maybe you could use some of Elm Chans Assembler Libraries and adapt the ihypot-function to your ATtiny. You would need to replace the MUL and maybe (i haven't checked) some other instructions.

answered Aug 17 '10 at 20:36

Juri Robl

5,614
2
29
46

Fast Hypotenuse Algorithm for Embedded Processor?

7 Answers7

AVR resources

Linked