0

I'm tasked to code a function (in C) that converts a float value to an IEEE standard for floating point representation. I think I understand how to do it on paper, but when I'm trying to code it I'm confused with how to store the bit pattern.

So if I have the value 3948.125, which is 111101101100.001 , as I'm multiplying the .125 by 2 until its done, how do I store whether its 0 or 1 as I go along?

0.125 x 2 = 0.25    0
0.25 x 2 = 0.5      0
0.5 x 2 = 1         1
= 0.001

In the end the value has to be returned as an int.

Here is a part of the code for converting the decimal part

while(decimal!=0) {
 decimal = decimal * 2;
 if (decimal > 1) {
     //here i would like to store a 1 somewhere
     decimal = decimal - 1;
 } 
 else {
     //I would like to store a 0 somewhere
 }
}

For another example, an input to the program will be 18.113, and the output should be 18.0937500000. I have to convert a 32 bit float value and output an integer representation that only uses 15 bits, 5 bits for exp and 9 bits for frac.

I understand how to do it on paper. When I'm trying to code it, I'm confused with how I store the bits as I calculate them. The code provided is only for converting the decimal part of the input into binary.

  • How are floats stored on your system? Maybe they are already stored according to IEEE. – Osiris Sep 23 '18 at 22:12
  • 2
    IMO it is bad practice to delete a downvoted question and repost it. I saw the old question at https://stackoverflow.com/questions/52469642/c-how-to-store-a-bit-pattern-for-ieee-floating-point-representation, although I have not enough reps on SO to view the deleted question. – gammatester Sep 23 '18 at 22:17
  • @Osiris no this is an assignment, i have to convert them. Which I think I know how to do, I'm just unsure of how to store things along the way – Bella Bear Sep 23 '18 at 22:22
  • You can store them in an int array, which is simple but memory consuming, or you can use bit operators to store single bits in an int. – Osiris Sep 23 '18 at 22:25
  • Get the significand, exponent and sign of the float, and then pick the bits you need by shifting and masking, moving them into a 15 (16?) bit integer type. Not too hard, but probably not doable in a portable way. – Rudy Velthuis Sep 23 '18 at 22:30
  • 2
    Take a look at the function `frexpf()`. This will get the significand (a.k.a. mantissa) and exponent. The sign can easily be detected by checking if the float > 0.0 or not. Only -0.0 might be hard to decode and problaby become +0.0. Now you just have to assemble the 16 bit float by picking (by shifting and masking) the apprpriate bits and putting them into place (by shifting, and or-ing). If you want to be really cool, you round the 9 bit mantissa before putting it into place, if you know how to do that. – Rudy Velthuis Sep 23 '18 at 22:38
  • @RudyVelthuis I don't think I'm allowed to use frexpf(), do you know how would I go about storing it using bit operators? – Bella Bear Sep 23 '18 at 23:34
  • @BellaBear: yes, but I'd have to write a complete answer, and at the moment, I can't. And the code would be non-portable. Which compiler and which platform? – Rudy Velthuis Sep 23 '18 at 23:37
  • @BellaBear: take a look here: https://stackoverflow.com/questions/15685181/how-to-get-the-sign-mantissa-and-exponent-of-a-floating-point-number – Rudy Velthuis Sep 23 '18 at 23:45
  • There are numerous ways to store bits, and they are generally implemented with elementary operations of C. This is something that is usually covered in ordinary classes, and certainly prerequisites to any course analyzing the bits that represent floating-pont numbers should include knowledge of bit manipulations. Often, unsigned integers are used to store bits. New bits are added to an unsigned integer by shifting the existing bits and using an OR to add the new bits. If an unsigned integer does not suffice, arrays of unsigned integers can be used.… – Eric Postpischil Sep 24 '18 at 20:26
  • … So, the general answer to a question about how to store bits while working with the representation of a floating-point number is “The usual ways.” If you are having problems implementing any of the usual ways, you should ask a more specific question. What do you know about bit operators? What is preventing you from simply shifting bits in an unsigned integer, using OR to add more, using AND to isolate specific bits, and so on? – Eric Postpischil Sep 24 '18 at 20:27

0 Answers0