1

I have question regarding how to handle some fixed point calculations. I can't figure out how to solve it. I know it is easy in floating point, but i want to figure out how to do it in fixed point.

I have a fixed point system where i am performing the following equation on a signal (vSignal):

Signal_amplified = vSignal * 10^Exp

The vSignal has an max amplitude of around 4e+05,

The system allows for representation of 2.1475e+09 (32 bit) signals. So there is some headroom for Signal_amplified.

For simplicity reason, lest just assume Exp can go from 0 to 10.

Lets say the first value is 2.8928. This value works well when calculating in floating point, since the expresson 10^2.8928 results in 781. When using a rounded floating point value 781 i get signal amplitudes of 3.0085e+08, well within the signal range.

If i try to represent the value 2.8928 with a Q format of, lets say Q12. The value changes to 11849. Now 10^11849 results in overflow.

How should one handle these large numbers?? I Could use another formatting like Q4, but even then the numbers get very large and my becomes poor. I would very much like to be able to calculate with a precision of .001, but i just can see how this should be done.

Minimal Working Example:

int vSignal = 400000

// Floatingpoint -> Goes well
double dExp = 2.89285
double dSignal_amplified = vSignal * std::pow(10,dExp)

// Fixedpoint -> Overflow
int iExp = 11848 // Q12 format
int iSignal_amplified = vSignal * std::pow(10,iExp)
iSignal_amplified =  iSignal_amplified>>12

Any ideas?

  • I'm confused about what language you are using. You've tagged _both_ C and C++, and then provided an example in Python. – paddy Oct 15 '19 at 07:29
  • Yeah sorry about that. I simulated in Python, but I am implementing in C/C++. But the same problem holds in both language. It was just to provide an example. – Mikkel Krogh Simonsen Oct 15 '19 at 07:31
  • By what method is `10^Exp` computed in fixed point? Shouldn't this calculation take into account the fixed point unit? It's not like 10^(2^(-k)) has any nice representation that would allow you to directly transfer binary places from input to output (did you plan to divide by `10^(2(-k))` afterwards?), so how would all of this work in the first place? – Max Langhof Oct 15 '19 at 07:39
  • 1
    Your fixed point example code is wrong even without overflow (that answers my above questions I guess). `10^(x/y)` is not the same as `(10^x) / y` (where `y = 2^k` in this case). [Note: Using `^` for exponentiation here, not `xor`.] – Max Langhof Oct 15 '19 at 07:45
  • "The value changes to 11849". No, it doesn't. The representation changes to `0x2e49`. The value changes to ‭2.892822265625‬. You're confusing fixed-point numbers and their binary representation. – MSalters Oct 15 '19 at 12:36

2 Answers2

1

Here is a proposal. It's just a rough idea, that needs to be adjusted and refined.

Say you need a precision of 0.01 (you can choose the precision you need of course) you can represent the exponent as: Exp = N + M*10^-1 + P*10^-2where N, M and P are integers and M and P are between 0 and 9.

Then you pre-compute and round all values for 10^(M*10^-1) * 100 and 10^(P*10^-2) * 100. They are all between 1 and 1000. Store them in a lookup table to avoid computing float operations at runtime. Let's call these lookup tables A[M] and B[P].

Then you can compute 10^Exp =( 10^N * A[M] * B[P] ) / 10000

The multiplication should not overflow since A[M] * B[P] is between 1 and 1,000,000 and A is lower than 10 according to what you said.

I did a quick test with a few values and it seems to give an acceptable precision.

Max Langhof
  • 23,383
  • 5
  • 39
  • 72
Guillaume Petitjean
  • 2,408
  • 1
  • 21
  • 47
  • That rounding of pre-computed values sounds scary. If you happen to choose an `Exp` where e.g. all of them were rounded down you might get a pretty high error in the result? I'm also not sure how this scales to more digits - you quickly get into overflow again as you add more digits. I think it would be more promising to reduce the problem to binary exponentiation (then figuring out the result decimals is easier) and trying to control the error in the `ln(10)/ln(2)` exponent factor. – Max Langhof Oct 15 '19 at 11:28
  • An example for the rounding problem: With 12 fixed digits you would have a maximum error of 0.5 * 10^-12 in each table entry, which could aggregate to about 12 x 0.5 * 10^-12 = 6 * 10^-12 relative error (i.e. the last digit of the result may be completely lost to this error). Or in general, if you have `k` digits, you will have an error of about `k` ULP with this scheme. Could be mitigated by having higher intermediate resolution, but the multiplication of the table entries would overflow even faster. – Max Langhof Oct 15 '19 at 11:33
  • Good comments @MaxLanghof. About the precision, it's hard to say if it's acceptable. but the main problem of my proposal is that it probably overflows quickly if you increase the number of digits. I think the best solution if to look at the bigger picture: first, why can't the OP use floating point even in software ? Is there really a performance issue. If there is, how is this amplified signal used further in the software, etc... – Guillaume Petitjean Oct 15 '19 at 11:47
  • The base-10 is weird. Computers aren't decimal. Instead, you use binary, and way more than 2 digits/bits. This may seem a bit expensive, but instead of two tables `A[10]` and `B[10]` you now only have one value per bit. So for the same amount of precalculated values, you now get 20 bits of precision = 6 digits, not 2. – MSalters Oct 15 '19 at 12:42
  • As I said it was a rough idea and needs to be refined. Switching to base 2 is better, you're right. – Guillaume Petitjean Oct 15 '19 at 12:57
  • I know the ideal case would have been to work in floating point, but the all mighty cost price & time overrules all the sane options like switching to a new DSP or altering previously defined amplification calculations etc. sometimes one just have to make the better of what he/she got. None the less, a solution was found for my implementation and that was really the issue, not the if, but the how. – Mikkel Krogh Simonsen Oct 15 '19 at 13:21
1

"If i try to represent the value 2.8928 with a Q format of, lets say Q12. The value changes to 11849. Now 10^11849 results in overflow.".

Mixed-type math is pretty hard, and it looks like you should avoid it. What you want is pow(Q12(10.0), Q12(2.8928)) or possibly an optimized pow10(Q12(2.8928)). For the first, see my previous answer. The latter can be optimized by a hardcoded table of powers. pow10(2.8928) is of course pow10(2) * pow10(.5) * pow10(.25) * pow10(.125) * ... - each 1 in the binary representation of 2.8928 corresponds to a single table entry. You may want to calculate the intermediate results in Q19.44 and drop the lowest 32 bits when you return..

Edit: Precision

Storing all the values of pow10(2^-n) up to n=12 has the slight problem that the result is close to 1, namely 1.000562312. If you'd store that as a Q12, you lose precision in rounding. Instead, it may be wise to store the value of pow10(2^-12) as a Q24, the value of pow10(2^-121) as a Q23 etc. Now evaluate Q12 pow10(Q12 exp) starting at the LSB of exp, not the MSB. You need to repeatedly shift the intermediate results as you move up to pow10(0.5) but half of the time you can merge that with the >>12 that's inherent to Q12 multiplication.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Thank you for the elaboration. I know one should avoid Mixed-type math, but sometimes it is a necessity when the system only accepts one type and the calculations are defined using another type. I need to focus on speed so i am going with the latter option of a hardcoded table. Thx, again. – Mikkel Krogh Simonsen Oct 15 '19 at 13:14