C/C++ - How to convert from a signed 32bit integer to a float and back

Question

I need to be able to convert a C SInt32 integer to a float in the range [-1, 1] and back. I've seen discussions of this question regarding 24 bit integers:

C/C++ - Convert 24-bit signed integer to float

And I've tried something similar:

 // Convert int - float
 SInt32 integer = 1;
 Float32 factor = 1;
 Float32 f = integer / (0x7FFFFFF + 0.5);

 // Perform some processing on the float
 Process(f);

 // Scale the float
 f = f * factor;

 // Convert float - int
 integer = f * (0x7FFFFFF + 0.5);

However this doesn't work. I know it doesn't work because the work I'm doing involves audio programming and the conversion causes a hissing sound.

I'm pretty sure it is a conversion problem because when I make the float smaller by setting the factor to 0.0001 the crackling disappears. Maybe the back conversion is putting the int out of it's limits and is causing it to be truncated.

Any advice would be greatly appreciated.

score 4 · Answer 1 · answered Oct 29 '12 at 17:09

4

Read up on IEEE floating point formats. The IEEE 32-bit float only supports 24 significant bits, so if you convert a 32-bit integer you will lose the low 8 bits.

answered Oct 29 '12 at 17:09

user9876

10,954
6
44
66

Thanks, so I need to use a 24 bit int. – James Andrews Oct 29 '12 at 17:15
@BenSmiley Or convert to `double`, that gives you 53 bits of precision (usually). – Daniel Fischer Oct 29 '12 at 20:06

score 2 · Accepted Answer · answered Oct 29 '12 at 17:12

2

const float recip = 1.0 / (32768.0*65536.0);
// hope that compiler will calculate this in advance
// From the expression an semi-advanced programmer can also immediately spot
// where the value comes from
float value = int_value * recip;
int value2 = value * (32768.0*65536.0);

The process is not reversible: one can lose up to 7 bits of accuracy.

answered Oct 29 '12 at 17:12

Aki Suihkonen

19,144
1
36
57

Multiplying with these values isn't exactly what OP wanted: both +-1 included, where as integers range from [-2^n .. 2^n-1], but multiplying or dividing by (2^n -1) produces slightly more noise. – Aki Suihkonen Oct 29 '12 at 17:21
The question indicates that the float may have the value 1. (“[-1, 1]” denotes a closed interval; it includes its endpoints.) The calculation for `value2` will convert a 1 in `value` to 2,147,483,648, which overflows a signed 32-bit integer. – Eric Postpischil Oct 29 '12 at 17:53
@EricPostpischil - yes, I noticed. Luckily float 1.0 * 2^31 as an integer does not overflow, but it saturates to MAX_INT according to the IEEE-754 standard. I think it's better design choice to clip a rare sample than to introduce quantization noise to every other sample. – Aki Suihkonen Oct 29 '12 at 20:07
1

The IEEE 754-2008 says, in clause 7.2, conversion to an integer when the source is outside the destination range causes an invalid operation exception. The 2011 C standard says, in clause F.4, that the invalid operation exception is raised and the result is unspecified. If the C implementation does not support annex F (which many do not), clause 6.3.1.4 says the behavior is undefined. – Eric Postpischil Oct 29 '12 at 20:27
-1, this will not only lose up to the 7 bits.. it can loose all bits. Assume int is 32bit, float = 1.0 * (32768.0*65536.0), will overflow, the value would become negative: INT32_MIN. – j-a Aug 13 '18 at 06:25
Why isnt that 32767 instead of 32768? ...i was using f*32767.0*65536.0 for 32bit float to int audio conversion and sometimes it produced clicks. changing to 32767 solved, to be honest i dont have any clue why. – Harry Jan 24 '19 at 16:03

C/C++ - How to convert from a signed 32bit integer to a float and back

2 Answers2