Check if a string representing float/unsigned int is too big

Question

I have a file containing strings representing float and uint64_t values.

I know exactly which string contains float values and which contains uint64_t values - that is not the problem I'm facing.

Here is how I convert them to their respective data-type:

char* t, v; 
uint64_t cn; 


cn = strtoull(t, &v, 10);

char* tt, vv; 
float cn2; 

cn2 = strtof(tt, vv);

But the problem arises at the following edge-case I want to catch:

Let's say the string for the uint64_t is "99999999999999999999999999999999999999999999999999"

This can't be represented within 8 bytes and therefore causes an overflow resulting in cn = 18446744073709551615.

Same problem with the float cn2.

How can catch this behavior?

Why do you limit yourself to `float` instead of the more common `double`? — Some programmer dude, Jul 14 '22 at 11:19
But using the similarly sized `uint64_t` is fine? Both it and `double` are eight bytes. What are your requirements? Are you targeting a small embedded system? Why then use `uint64_t` instead of the smaller `uint32_t`? Again, what are your requirements? What is the actual (underlying) problem you're trying to solve? On any modern PC-like system (including mobile phones and even some "embedded" devices and IoT hardware) such memory optimizations might be premature. — Some programmer dude, Jul 14 '22 at 11:22
@stht55 For your question, `errno` is definitely the way to go, as explained in Eric Postpischil's comment and Some programmer dude's answer. Just be aware that this is an unusual situation: The `strto*` functions are just about the only functions in the C library that work in this way. For these functions, setting `errno` to 0 beforehand and checking it afterwards is the way to go. But for most other functions, setting `errno` to 0 beforehand is meaningless — for most other functions, `errno` is only meaningful after the call *if the call has explicitly returned an error code*, like -1. — Steve Summit, Jul 14 '22 at 11:23
@Someprogrammerdude I have to do work with matrices in coo format. The values are in single precision. — stht55, Jul 14 '22 at 11:27
stht55, What should happen with input like 1) "-123", 2) "123abc", 3) "abc", 4) " "? — chux - Reinstate Monica, Jul 14 '22 at 11:54
@Reinstatemonica before I check if the strings have the correct format beforehand. — stht55, Jul 14 '22 at 12:10

Eric Postpischil · Accepted Answer · 2022-07-14T12:28:42.613

3

strtoull provides an indication that the value is out of range. Consider this code:

#include <errno.h>
…
errno = 0; // Set error code to zero before call.
unsigned long long x = strtoull(t, &v, 10);
if (errno == ERANGE)
{
    // Handle out-of-range error.
}

This will reach the error case if the numeral in t is too large.

Note that if t contains a minus sign (but is not too large in magnitude), strtoull will return a value that is “negated” in the unsigned long long type (that is, a value wrapped around ULLONG_MAX+1) even though a negative value is out of range of the type; no error indication will be provided. So, if you want to detect all out-of-range cases, you must check t for a minus sign (possibly after leading white space) with a non-zero return value.

strtof provides sufficient information to distinguish cases, per C 2018 7.22.1.3 10:

If the value is positive and too large, HUGE_VALF is returned and errno is set to ERANGE.
If the value is negative and too large, -HUGE_VALF is returned and errno is set to ERANGE.
If the value underflows the normal range, a small value is returned and errno is set to ERANGE.

edited Jul 14 '22 at 12:28

answered Jul 14 '22 at 11:32

Eric Postpischil

195,579
13
168
312

When `unsigned long long` is wider than 64 bits, another test warranted `if (errno == ERANGE || x > UINT64_MAX)` – chux - Reinstate Monica Jul 14 '22 at 11:52
1

"But it will also reach the error case if the numeral in t is negative." Actually it won't, I just learnt. The standard says "If the subject sequence begins with a minus sign, the value resulting from the conversion is negated (in the return type)". See this example: https://godbolt.org/z/sEo9qWc6x. Compare the first two tests - errno only gets set in the 3rd test (underflow/overflow). – Lundin Jul 14 '22 at 11:56
I just checked if it works with `HUGE_VAL`. The string `"1.1111111111111111111111111111111111111111111111111111111111111112"` is cast to `"1.111111"` without setting `errno == ERANGE` – stht55 Jul 14 '22 at 12:08
@stht55: 1.1111111111111111111111111111111111111111111111111111111111111112 is not outside the finite range of `float`. It is between 1 and 2, both of which are representable in the format. It is not exactly representable in the format, and this is considered normal for floating-point operations. – Eric Postpischil Jul 14 '22 at 12:30
@EricPostpischil yes but in my case, no loss of precision is allowed so this should be caught. – stht55 Jul 14 '22 at 12:32
@stht55 Checking for _exactness_ (no loss of precision) is extremely problematic. Consider [`FLT_MIN`](https://codereview.stackexchange.com/q/212490/29485), as decimal text ,may be dozens of (e.g. 89) characters long. It is not specified that `strtof()` and friends use beyond a much smaller number of significant digits in the conversion. To achieve your _exactness_ goal likely obliges your own crafted conversion code - an error-prone proposition. – chux - Reinstate Monica Jul 14 '22 at 12:45
@stht55 If you really can't tolerate loss of precision, you might want to ask a separate question about that. In general, if you're converting a floating-point string in decimal to an actual floating-point value in binary, you virtually *always* have some loss of, or at least slight change in, precision. – Steve Summit Jul 14 '22 at 12:49
For example, when converting decimal strings to conventional IEEE754 `float`, the input `"1.0"` is exact, and the input `"1.00000011920928955078125"` is exact, but the seemingly simpler value `"1.0000001"` between them is not exact, and arguably suffers "precision loss". – Steve Summit Jul 14 '22 at 12:57

Some programmer dude · Answer 2 · 2022-07-14T11:26:45.723

1

According to this strtoull reference, when an overflow happens it will return ULLONG_MAX and errno is set to ERANGE.

So you should set errno to zero before the call, and after could check for ULLONG_MAX and errno == ERANGE to see if overflow happens.

Similarly with strtof it will return HUGE_VALF.

edited Jul 14 '22 at 11:26

answered Jul 14 '22 at 11:18

Some programmer dude

400,186
35
402
621

2

Testing `HUGE_VALF` is inadequate as `HUGE_VALF` may be returned for a normal case where the value is not out of range, in C implementations where infinity is not represented in the floating-point format. Additionally, `errno` should be set to zero before the call. – Eric Postpischil Jul 14 '22 at 11:22
@Someprogrammerdude but what if the value ist equal to `ULLONG_MAX`? – stht55 Jul 14 '22 at 11:59
@stht55 If the actual input in the string is the same as `ULLONG_MAX`, then `errno` should still be zero after the call. – Some programmer dude Jul 14 '22 at 12:03
@Someprogrammerdude I just checked if it works with `HUGE_VAL`. The string "1.1111111111111111111111111111111111111111111111111111111111111112" is cast to "1.111111" without setting `errno == ERANGE` – stht55 Jul 14 '22 at 12:05
@stht55 That's not an *overflow*, it's just loss of precision. BTW, I'm pretty sure that 1.111111 can't be exactly represented by a `float`. – Bob__ Jul 14 '22 at 12:10
@stht55 That's no an out of range problem. Is your problem with the floating point numbers that the input string have much higher precision than `float` can handle? That's a completely different problem. Perhaps you might want to check the number of digits in the integer and decimals parts to see if they individually are more than can be handled by a single-precision floating point number? – Some programmer dude Jul 14 '22 at 12:10
@Someprogrammerdude yes might be an option. Just thought that this loss of precision also means that in the process the `of` flag will be set. – stht55 Jul 14 '22 at 12:13

chux - Reinstate Monica · Answer 3 · 2022-07-15T12:36:47.420

If values less than 0 need detection

strtoull("-1", ..., ...) "wraps" and returns ULLONG_MAX.

strto(u)ll() converts to at least a 64-bit integer. Plan for future growth where unsigned long long may exceed 64-bit.

Example:

#include <errno.h>
#include <stdint.h>
#include <stdlib.h>

uint64_t convert_str_uint64(const char *s, char **endptr, int base) {
  errno = 0;
  long long ival = strtoll(s, endptr, base);
  // Look for negative numbers
  if (ival < 0) {
    errno = ERANGE;
    return 0;
  }
  // We are done for many positive numbers
  if (*endptr > s && ival <= INT64_MAX && errno == 0) {
    return (uint64_t) ival;
  }

  // Input may still be a valid value more than INT64_MAX.
  errno = 0;
  unsigned long long uval = strtoull(s, endptr, base);

#if ULLONG_MAX > UINT64_MAX
  if (uval > UINT64_MAX) {
    uval = UINT64_MAX;
    errno = ERANGE;
  }
#endif

  return (uint64_t) uval;
}

Check if a string representing float/unsigned int is too big

3 Answers3