What is the problem with this fixed point conversion from floating point in c?

Question

I am trying to convert floating point to fixed point because the hardware does not have floating point acceleration. I cannot seem to find my mistake for the variable screen_X the value is always 320.0. The precision is Q38,26 (edit useful bits are Q6.26, 64 bit variables are used for mul and div compatibility).

#define fixed_precision 26
#define CONV_F(A) (A*conv_fixed_precision)
#define CONV_DF(A) (A/(1<<fixed_precision))
#define MUL38_26(A, B)((long long)(A*B)>>fixed_precision)
#define DIV38_26(A, B)((long long)(A<<fixed_precision)/B)

// global variable
double conv_fixed_precision = (1 << fixed_precision);

The floating point code is the following :

void draw_balls(VGA& vga) {
   int i;
   for (i = 0; i < NB_BALL; i++) {
       if (R_Ball[i] == 0) continue;   // the pearl has already been found
       double screen_X = X_Ball[i];
       double screen_Y = Y_Ball[i];

       screen_X -= X_position;
       screen_Y -= Y_position;
       screen_X *= Scale;
       screen_Y *= Scale;

       double screen_Radius = R_Ball[i] * Scale;
       if ((screen_X * screen_X + screen_Y * screen_Y <= screen_Radius * screen_Radius)) {
           //La perle est trouvee
           R_Ball[i] = 0;
           int time = get_time();
           printf("Found %i pearl(s) at time %i\n", ++found_pearl, time);
       }

       double tmp_X = screen_X;
       double tmp_Y = screen_Y;
       screen_X = (tmp_X * cosAngle - tmp_Y * sinAngle) * 240 + 320;
       screen_Y = (tmp_X * sinAngle + tmp_Y * cosAngle) * 240 + 240;
       screen_Radius *= 240;

       if (screen_X + screen_Radius < 0) break;
       if (screen_X - screen_Radius >= 640) break;
       if (screen_Y + screen_Radius < 0) break;
       if (screen_Y - screen_Radius >= 480) break;

       drawBall3D(vga, screen_X, screen_Y, screen_Radius, i % 7);
   }
#ifdef USE_OPEN_GL
  glFlush();
#endif

and the fixed point code is the following:

void draw_balls(VGA & vga) {
    int i;

    long long cosAngF = CONV_F(cosAngle);
    long long sinAngF = CONV_F(sinAngle);
    long long scaleF = CONV_F(Scale);
    long long xPosF = CONV_F(X_position);
    long long yPosF = CONV_F(Y_position);

    for (i = 0; i < NB_BALL; i++) {
        if (R_Ball[i] == 0) continue;   // the pearl has already been found
        long long screen_X = CONV_F(X_Ball[i]);
        long long screen_Y = CONV_F(Y_Ball[i]);

        screen_X -= xPosF;
        screen_Y -= yPosF;
        screen_X = MUL38_26(screen_X, scaleF);
        screen_Y = MUL38_26(screen_Y, scaleF);

        long long screen_Radius = CONV_F(R_Ball[i] * Scale);
        if ((MUL38_26(screen_X, screen_X) + MUL38_26(screen_Y, screen_Y)) <= MUL38_26(screen_Radius, screen_Radius)) {
            //La perle est trouvee
            R_Ball[i] = 0;
            int time = get_time();
            printf("Found %i pearl(s) at time %i\n", ++found_pearl, time);
        }

        long long tmp_X = screen_X;
        long long tmp_Y = screen_Y;
        long long F240 = ((long long)240)<<fixed_precision;
        long long F320 = ((long long)320) << fixed_precision;
        
        screen_X = MUL38_26(MUL38_26(tmp_X, cosAngF) - MUL38_26(tmp_Y, sinAngF), F240) + F320;
        screen_Y = MUL38_26(MUL38_26(tmp_X, sinAngF) + MUL38_26(tmp_Y, cosAngF), F240) + F240;
        screen_Radius = MUL38_26(screen_Radius, F240);

        long long F640 = ((long long)640) << fixed_precision;
        long long F480 = ((long long)480) << fixed_precision;
        if (screen_X + screen_Radius < 0) break;
        if (screen_X - screen_Radius >= F640) break;
        if (screen_Y + screen_Radius < 0) break;
        if (screen_Y - screen_Radius >= F480) break;

        drawBall3D(vga, screen_X>>fixed_precision, screen_Y >> fixed_precision, screen_Radius >> fixed_precision, i % 7);
    }
#ifdef USE_OPEN_GL
    glFlush();
#endif
#endif
    
}

Is there any reason these macros can't be regular functions the optimizing compiler pass can't deal with? Would using [`int64_t`](https://en.cppreference.com/w/c/types/integer) make it clear what you're using versus `long long`? — tadman, Mar 16 '21 at 05:19
@hockey_buzz - What about the \ at the end of your `#define`s? It doesn't help anyone if you post wrong code. — Armali, Mar 16 '21 at 07:37
@Armali my bet its something like this: `#define conv_fixed_precision double((long long)(1ll< — Spektre, Mar 16 '21 at 10:55
@tadam, i have plateform restrictions, i use this code to simulate on my x86 personnal pc, but the actual application is used on the NIOS II processor, Macros allow for smaller assembly code from compiler (i think) since it doesn't need to load and store for new functions everytime. At the end of the day, most of my functions are going to be implemented in vhdl, i need to simulated in C before implementing because it will allow me to understand the ASM (algorithmic state machine) needed to do so. — hockey_buzz, Mar 17 '21 at 08:36
@Spektre i have updated the post to add the `conv_fixed_precision` it is actually to pass from double to fixed point variables — hockey_buzz, Mar 17 '21 at 08:38
Indeed i should update my code to be MUL6_26 which is more precise. — hockey_buzz, Mar 17 '21 at 08:42
@hockey_buzz OK I see you work-arounded the casting by using variable instead of macro, I do that too for C based on GCC on MCUs too, as their syntax quirks makes me crazy sometimes. I usually do it as `static const double` again to resolve GCC issues ... — Spektre, Mar 17 '21 at 08:43

Spektre · Answer 1 · 2021-03-16T09:15:13.197

2

The \ in your #defines is wrong

The \ tells your compiler that the define continues on next line but your defines are single line so that makes your code wrong even not compilable. My bet is you are doing this on some MCU or DSP on GCC based compiler which usually on error do not produce new binary so you most likely program with some old version that was last compilable in the past (its sometimes hard to spot this as the message build has stop is not very distinguishable from successful operation in the compile console).
you got 38.26 which adds to 64bit and use long long

the problem is that long long can be anything so how many bits it is on your environment? If just 64bit then you got problem because this:
```
A<<fixed_precision
```
would require 64+fixed_precision bits to store the result so if you got less then you throw away MSB bits of A invalidate your results (division)

This can be done on 64bit without loosing precision. Here the fixed point math I am usually implementing:
```
m = 1<<n
Integer(a) = Real(x*m)
Real(x) = Integer(a)/m
x*m + y*m = (x+y)*m         -> (a+b)
x*m - y*m = (x-y)*m         -> (a-b)
x*m * y*m = (x*y)*m*m       -> (a*b)>>n 
x*m / y*m = (x/y) + (x%y)/y -> (a/b)<<n + ((a%b)<<n)/b
```
However beware the multiplication a*b still needs 128 bits so use naive O(n^2) or Karatsuba multiplication if you do not have access to ALU like multiplication or port this to 64bit. Another option is to disect the a*b>>n into division (>>n) and modulo (&(n-1)) parts similar like I did with division. However too lazy to equate that as I got access to 64b*64b=(64+64)b operations on all platforms I code for.
signed arithmetic shifts right on 2'os complement require SAR

in case your numbers are signed then naive use of a>>n is not possible as it most likely destroys your number sign. To divide by power of 2 on 2'os complement you need SAR (Shift Arithmetic Right) so You need to copy MSB bit so for 1 bit shifts do this:
```
(a>>1)|(a&0x8000000000000000);
```
for more bit shifts you need to add branches ... for example 4,8,16 bit:
```
(a>> 4)|((a&0x8000000000000000)?0xF000000000000000:0);
(a>> 8)|((a&0x8000000000000000)?0xFF00000000000000:0);
(a>>16)|((a&0x8000000000000000)?0xFFF0000000000000:0);
```
The masks can be stored in LUT or you can use 1 bit shift in a loop but that would be slower.

edited Mar 16 '21 at 09:15

answered Mar 16 '21 at 07:59

Spektre

49,595
11
110
380

1

In what situations does C "destroy the number sign" when right-shifting a signed integer? – chtz Mar 16 '21 at 08:55
1

@chtz for any number `<0` on platform where C/C++ implementation does not use SAR. This happens for example on all x86 CPUs ... even MCUs and DSPs as `>>` usually use logical non cyclic rotation. IIRC the behavior of MSB is undefined as not any CPU has SAR ... try for example print the result of 16bit `(-2>>1)` and suddenly the result will be signed big number `0x7FFF` instead of `-1` – Spektre Mar 16 '21 at 09:09
1

I'm pretty sure that at least on x86 CPUs C uses arithmetic shifts for signed integer types: https://godbolt.org/z/er54dc (I would have assumed that this is standardized but I don't know the relevant part of the standard). – chtz Mar 16 '21 at 09:19
@chtz newer compilers usually use SAR or encode it in similar manner I did however you can not take it for granted and on MCU/DSP platform is this usually a problem. For example if I do this in my BDS2006 it is OK but on older ones it was not. GCC for MCU screw this almost always – Spektre Mar 16 '21 at 09:19
@chtz also there are a lot of CPUs/MCUs that do not have any SAR ... and as the OP is targeting this area its important to deal with it in safe manner – Spektre Mar 16 '21 at 09:22
1

Info from the C standard: _The result of E1 >> E2 is E1 right-shifted E2 bit positions. … If E1 has a signed type and a negative value, the resulting value is implementation-defined._ – Armali Mar 16 '21 at 10:38
1

@Armali exactly my point programmers take `x>>1` for granted as the equivalent to `x/2` (me included in the past till I hit a wall) but its just illusion ... that is not safe across platforms and even across compilers on the same platform – Spektre Mar 16 '21 at 10:47
@Spektre the '`' was for The stackoverflow text editor, it is not in my actual code, also i use longlong because i work with 32 bit so it's actually, Q6.26. I guess i have a lot to learn – hockey_buzz Mar 17 '21 at 08:25
1

@hockey_buzz its better to copy paste code to your posts then to retype it as that could lead to errors like this (select it and then click on the code icon in the editor and it will reformat it to markdown code (intend it by 4 spaces). Also if you want to enforce syntax highlight add (for example C++) this before code without intendation and with empty line separation between code and it: `` for different language just change the `cpp` ... – Spektre Mar 17 '21 at 08:32
1

@spektre you are right, it was my first ever code block, ithe two others of the draw_balls() functions are copy paste. Also my problem is resovled, it was actually my compiler that was mixing up my defines. I changed the line `screen_X = MUL38_26(MUL38_26(tmp_X, cosAngF) - MUL38_26(tmp_Y, sinAngF), F240) + F320;` to `screen_X = MUL38_26((MUL38_26(tmp_X, cosAngF) - MUL38_26(tmp_Y, sinAngF)), F240) + F320;` Notice the parantesis in the addiition in the middle of the MUL. – hockey_buzz Mar 17 '21 at 08:41
@hockey_buzz I see it ... but only after copy paste the stuf in text editor and align it on 2 consequent lines ... this is a common problem with multiparameter macros... I am adding parentis `()` paranoicaly because of that ... its similar to boolean expresions in `if` statements... – Spektre Mar 17 '21 at 08:48
@Spektre yes it has become a habit to do so for me as well, avoid as much heachache as possible. *laughs* – hockey_buzz Mar 17 '21 at 08:53

What is the problem with this fixed point conversion from floating point in c?

1 Answers1