How to see the size of the incoming floating point number?

Question

The user writes a number to the input, it is stored in a string. How can I check if this number is included in size in the float type or does it need a double?

Always (yes, **always**) use `double`. Don't even think about `float` or `long double` without a strong reason. *The teacher told me to use `float`* is only a strong reason after a request for clarification/justification. — pmg, May 10 '20 at 08:41
what kind of numbers are you talking about? `-7000000`? `0.0000000000042`? `3.14`? `3.14159265358979323846`? `7.3*10^365`? — pmg, May 10 '20 at 08:47
You need to make the decision at compile time really, which means that examining an incoming string isn't an option. — Weather Vane, May 10 '20 at 10:09
This seems like a XY-problem. Typically, you do not choose the type of a variable depending on the input. You choose a type that can handle the input you want. — klutt, May 10 '20 at 11:26
@pmg: Oh, stop. Yes, this particular question has some mystery about why it is choosing between `float` and `double`, but the advice to, emphatically, always use `double` is inappropriate. People processing 8-bit sensor data with convolutions, FFTs, and similar algorithms do not need `double`, and its memory footprint is counterproductive. Most people doing neural network processing do not need `double`. — Eric Postpischil, May 10 '20 at 13:01
Shouldn't you also decide whether some size of INT would suffice? — Rick James, May 25 '20 at 03:46

Tarik · Accepted Answer · 2020-05-10T22:21:08.273

1

Unless your floating point numbers are huge or extremely small, i.e. out of the range spanning -3.4E38 to 3.4E38, a float 32 will store anything you throw at it in terms of size but not accuracy. As such, the real issue is how many significant digits you need in order to minimize rounding errors. I recommend you to read https://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf

If you are not limited by disk space or memory, then just go for float 64.

edited May 10 '20 at 22:21

answered May 10 '20 at 08:46

Tarik

10,810
2
26
40

1

Even within that range, there are plenty of values not representable with a float, so technically it won't store "anything you throw at it". Of course neither will a double, but it will be closer. – Felix G May 10 '20 at 11:05

chux - Reinstate Monica · Answer 2 · 2020-05-10T19:18:23.293

How can I check if this number is included in size in the float type or does it need a double?

Numbers encoded as strings offer limitless possibilities. Finite float and double are limited in range and precision.

Note that float is a subset of double.

The set of values of the type float is a subset of the set of values of the type double; C17dr § 6.2.5 10

Range

The range of double typically well exceeds that of float.

Precision

Typical float and double are a 2^N * a dyadic rational: some integer/some-power-of-two. So conversion from string to floating point involves some rounding. E.g. 0.1 is not typically exactly representable as float nor as a double.

This implies most inexact conversions, even if in float range, will have a closer answer as double than float.

To meet OP's goal, I'd suggest converting the string to both and test the conversion results.

int float_or_double_range(const char *s) {
  char *endptr;
  errno = 0;
  double d = strtod(s, &endptr);
  if (s == endptr) return 'n';  // Neither
  if (errno == ERANGE) return 'd';

  errno = 0;
  double f = strtof(s, &endptr);
  if (s == endptr) return 'd';
  if (errno == ERANGE) return 'd';

  if (d == f) return 'f'; // encodable as float and double
  return 'd';
}

Notes:

Recall that the correctness of FP strto...() functions are subject to quality of implementation issues and that they themselves may not provide the best answers in all cases.

To find if the converted string value is the same as a double and float, I recommend against converting the string to double and then the double to float. That involves double rounding and introduces errors in corner cases.

RobertS supports Monica Cellio · Answer 3 · 2020-05-11T06:51:06.167

This answer is only for positive float´s but it might help you out:

A 32-bit float (8 byte/ Single precision) as defined by IEEE 754 has the largest positive float number of 3.40282 x 10^38, the smallest positive float number is 1.17549 x 10^-38.

Use strtod() to convert the number in the string to a double. This is needed because you actually don´t know if the number is already double or not.

Then check if the number is within the range provided above.

If it is, allocate a float. If not, continue to use the double object.

This way is a bit muddy, because you already allocate a double, then choosing for either use the double used as buffer before or allocate another float. Nonetheless, this is beneficial if you allocate for example an array based of the number of the string. Furthermore, you also have the option to dynamically allocate the buffer double object and free() it after its use.

A much simpler way would be to just choose a double from start. So ensure that this process is really required. Unless it isn´t an explicit prohibition to use a double, just use a double.

This will save you a lot of time and effort and is with that also the safest way to go.

The largest number representable in IEEE-754 binary32 is ∞, not 3.40282e38. The latter is approximately the largest **finite** number representable, 2\*\*1024−2\*\*971. — Eric Postpischil, May 10 '20 at 13:02
"smallest positive number" with [binary32](https://en.wikipedia.org/wiki/Single-precision_floating-point_format) is more like [1.40129846e-45](https://stackoverflow.com/a/61467328/2410359). 1.17549 x 10^-38 is about the smallest normal positive `float`. — chux - Reinstate Monica, May 10 '20 at 20:08

How to see the size of the incoming floating point number?

3 Answers3

Linked