1

I am working on a project where I need to use Fixed Point math, I am not able to figure out why the numbers are "Rolling Over", I was able to get a large enough number when I changed the shift amount from 16 to 8 and finally to 4. Here is the code I am using at present:

#define SHIFT_AMOUNT 8
#define SHIFT_MASK ((1 << SHIFT_AMOUNT) - 1)
#define FIXED_ONE (1 << SHIFT_AMOUNT)
#define INT2FIXED(x) ((x) << SHIFT_AMOUNT)
#define FLOAT2FIXED(x) ((int)((x) * (1 << SHIFT_AMOUNT)))
#define FIXED2INT(x) ((x) >> SHIFT_AMOUNT)
#define FIXED2FLOAT(x) (((float)(x)) / (1 << SHIFT_AMOUNT))

int32_t test = FLOAT2FIXED(1.00);

void setup()
{
   Serial.begin(57600);
}

void loop(){

   test += FLOAT2FIXED(1.00);
   Serial.println(FIXED2FLOAT(test));

}

And the output:

1
2
3

...

127
-128
-127
-126

When SHIFT_AMOUNT = 8 I am only able to store variables from -128 to 128 but since I am using a 32 bit variable shouldn't a 16 bit shift move the decimal point to the "Middle" leaving 2 16 bit sections, one for the Whole Number and the other for the decimals? Shouldn't the whole range of the int32_t be −2,147,483,648 to 2,147,483,647 with the shift at 16? Is there a setting that I am missing or am I just way off with how this works?

If SHIFT_AMOUNT = 4 I get a range that I need but this doesn't seem right since all the other examples that I have seen online use the 16 bit shift.

Here is a link showing what I am looking to do

EDIT

If I have this correctly, when shifting 8 bits when using a 16 bit wide type that leaves 8bits for the whole and 8 for the fractal leaving a range of -128 to 128. Hence the need for using the 4bit shift increasing the range of the whole to -32,768 to 32,767 is this correct? If that is right then is the int32_t not a true 32 bit wide?

EDIT2

Patrick Trentin pointed out where I was going wrong. Everything was correct except for the part I copied from the linked question. I was casting to a int not a int32_t. The int type is 16bits wide, hence having to use 4 to get the range I needed.

Community
  • 1
  • 1
Andy Braham
  • 9,594
  • 4
  • 48
  • 56
  • Why not use one of the fixed point primitives that avr-gcc [already supports](https://gcc.gnu.org/wiki/avr-gcc#Fixed-Point_Support)? – Ignacio Vazquez-Abrams Jun 12 '16 at 15:51
  • @IgnacioVazquez-Abrams One of the reasons that I am going through this is to one learn how this works, what the results should be, how it was made and to get a better understanding. If I don't understand how this works and why it works then I will not know if a library is giving me an erroneous value or if my other code has a bug. I think I understand how this works, I updated my question to try and make it a little more clear. – Andy Braham Jun 12 '16 at 16:25
  • @AndyBraham the size of `(int)` in Arduino is `16-bit`, why did you cast the value to `int` in the macro rather than `int32_t` ? – Patrick Trentin Jun 12 '16 at 16:35

1 Answers1

1

Change this:

#define FLOAT2FIXED(x) ((int)((x) * (1 << SHIFT_AMOUNT)))

into this:

#define FLOAT2FIXED(x) ((int32_t)((x) * (1 << SHIFT_AMOUNT)))

Rationale: the size of int is 16-bit on an Arduino Uno (see the documentation), this caps the size of the values that you are storing within your int32_t variable to 16 bits.


EDIT:

The fact that int16_t is an alias of signed int, which is an alias for int, can be corroborated by either looking at the online documentation or at the content of the file

arduino-version/hardware/tools/avr/lib/avr/include/stdint.h

among the Arduino Uno sources:

/** \ingroup avr_stdint
    16-bit signed type. */

typedef signed int int16_t;
Patrick Trentin
  • 7,126
  • 3
  • 23
  • 40
  • I missed that one, I knew there was some sort of small issue thank you. Is that a typical setting to have the variable without any postfix smaller than the rest? I mean I would of thought that int = highest possible (int32_t). – Andy Braham Jun 12 '16 at 20:31
  • @AndyBraham I extended my answer to answer your question, hope it helps. (: – Patrick Trentin Jun 12 '16 at 20:56