17

I'd like to know if it is an easy way of determining the maximum number of characters to print a decimal int.

I know <limits.h> contains definitions like INT_MAX that say the maximum value an int can assume, but it is not what I want.

I'd like to be able to do something like:

int get_int( void )
{
    char draft[ MAX_CHAR_OF_A_DECIMAL_INT ];

    fgets( draft, sizeof( draft ), stdin );
    return strtol( draft, NULL, 10 );
}

But how to find the value of MAX_CHAR_OF_A_DECIMAL_INT in a portable and low overheaded way?

Thanks!

j4x
  • 3,595
  • 3
  • 33
  • 64
  • Couldn't you take INT_MAX, convert to a string, and count the length, and then add one (to allow for a leading -) – Tyler Eaves May 10 '12 at 14:31
  • 2
    Presumably you don't actually need the maximum possible length, just a number greater than or equal to that, and not so huge as to be very wasteful? `BIG_ENOUGH_FOR_AN_INT`, rather than `BIGGEST_AN_INT_CAN_BE`. – Steve Jessop May 10 '12 at 14:43

8 Answers8

15

If you assume CHAR_BIT is 8 (required on POSIX, so a safe assumption for any code targetting POSIX systems as well as any other mainstream system like Windows), a cheap safe formula is 3*sizeof(int)+2. If not, you can make it 3*sizeof(int)*CHAR_BIT/8+2, or there's a slightly simpler version.

In case you're interested in the reason this works, sizeof(int) is essentially a logarithm of INT_MAX (roughly log base 2^CHAR_BIT), and conversion between logarithms of different bases (e.g. to base 10) is just multiplication. In particular, 3 is an integer approximation/upper bound on log base 10 of 256.

The +2 is to account for a possible sign and null termination.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 2
    Derivation: it takes an average of 3.2 bits to represent a decimal digit; each 8-bit byte can represent an average of 2.5 decimal digits; rounding up gives you 3 (hence `3 * sizeof (int)`). Then you need an extra character for the sign, and an extra character for the 0 terminator (hence the `+ 2`). – John Bode May 10 '12 at 15:38
6

The simplest canonical and arguably most portable way is to ask snprintf() how much space would be required:

char sbuf[2];
int ndigits;

ndigits = snprintf(sbuf, (size_t) 1, "%lld", (long long) INT_MIN);

slightly less portable perhaps using intmax_t and %j:

ndigits = snprintf(sbuf, (size_t) 1, "%j", (intmax_t) INT_MIN);

One could consider that to be too expensive to do at runtime though, but it can work for any value, not just the MIN/MAX values of any integer type.

You could of course also just directly calculate the number of digits that a given integer would require to be expressed in Base 10 notation with a simple recursive function:

unsigned int
numCharsB10(intmax_t n)
{
        if (n < 0)
                return numCharsB10((n == INTMAX_MIN) ? INTMAX_MAX : -n) + 1;
        if (n < 10)
                return 1;

        return 1 + numCharsB10(n / 10);
}

but that of course also requires CPU at runtime, even when inlined, though perhaps a little less than snprintf() does.

@R.'s answer above though is more or less wrong, but on the right track. Here's the correct derivation of some very well and widely tested and highly portable macros that implement the calculation at compile time using sizeof(), using a slight correction of @R.'s initial wording to start out:

First we can easily see (or show) that sizeof(int) is the log base 2 of UINT_MAX divided by the number of bits represented by one unit of sizeof() (8, aka CHAR_BIT):

sizeof(int) == log2(UINT_MAX) / 8

because UINT_MAX is of course just 2 ^ (sizeof(int) * 8)) and log2(x) is the inverse of 2^x.

We can use the identity "logb(x) = log(x) / log(b)" (where log() is the natural logarithm) to find logarithms of other bases. For example, you could compute the "log base 2" of "x" using:

log2(x) = log(x) / log(2)

and also:

log10(x) = log(x) / log(10)

So, we can deduce that:

log10(v) = log2(v) / log2(10)

Now what we want in the end is the log base 10 of UINT_MAX, so since log2(10) is approximately 3, and since we know from above what log2() is in terms of sizeof(), we can say that log10(UINT_MAX) is approximately:

log10(2^(sizeof(int)*8)) ~= (sizeof(int) * 8) / 3

That's not perfect though, especially since what we really want is the ceiling value, but with some minor adjustment to account for the integer rounding of log2(10) to 3, we can get what we need by first adding one to the log2 term, then subtracting 1 from the result for any larger-sized integer, resulting in this "good-enough" expression:

#if 0
#define __MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t) \
    ((((sizeof(t) * CHAR_BIT) + 1) / 3) - ((sizeof(t) > 2) ? 1 : 0))
#endif

Even better we can multiply our first log2() term by 1/log2(10) (multiplying by the reciprocal of the divisor is the same as dividing by the divisor), and doing so makes it possible to find a better integer approximation. I most recently (re?)encountered this suggestion while reading Sean Anderson's bithacks: http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10

To do this with integer math to the best approximation possible, we need to find the ideal ratio representing our reciprocal. This can be found by searching for the smallest fractional part of multiplying our desired value of 1/log2(10) by successive powers of 2, within some reasonable range of powers of 2, such as with the following little AWK script:

    awk 'BEGIN {
            minf=1.0
    }
    END {
            for (i = 1; i <= 31; i++) {
                    a = 1.0 / (log(10) / log(2)) * 2^i
                    if (a > (2^32 / 32))
                            break;
                    n = int(a)
                    f = a - (n * 1.0)
                    if (f < minf) {
                            minf = f
                            minn = n
                            bits = i
                    }
                    # printf("a=%f, n=%d, f=%f, i=%d\n", a, n, f, i)
            }
            printf("%d + %f / %d, bits=%d\n", minn, minf, 2^bits, bits)
    }' < /dev/null

    1233 + 0.018862 / 4096, bits=12

So we can get a good integer approximation of multiplying our log2(v) value by 1/log2(10) by multiplying it by 1233 followed by a right-shift of 12 (2^12 is 4096 of course):

log10(UINT_MAX) ~= ((sizeof(int) * 8) + 1) * 1233 >> 12

and, together with adding one to do the equivalent of finding the ceiling value, that gets rid of the need to fiddle with odd values:

#define __MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t) \
    (((((sizeof(t) * CHAR_BIT)) * 1233) >> 12) + 1)

/*
 * for signed types we need room for the sign, except for int64_t
 */
#define __MAX_B10STRLEN_FOR_SIGNED_TYPE(t) \
    (__MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t) + ((sizeof(t) == 8) ? 0 : 1))

/*
 * NOTE: this gives a warning (for unsigned types of int and larger) saying
 * "comparison of unsigned expression < 0 is always false", and of course it
 * is, but that's what we want to know (if indeed type 't' is unsigned)!
 */
#define __MAX_B10STRLEN_FOR_INT_TYPE(t)                     \
    (((t) -1 < 0) ? __MAX_B10STRLEN_FOR_SIGNED_TYPE(t)      \
                  : __MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t))

whereas normally the compiler will evaluate at compile time the expression my __MAX_B10STRLEN_FOR_INT_TYPE() macro becomes. Of course my macro always calculates the maximum space required by a given type of integer, not the exact space required by a particular integer value.

Greg A. Woods
  • 2,663
  • 29
  • 26
5

I don't know if it is any trick to do what you want in plain ANSI-C, but in C++ you can easily use template metaprogramming to do:

#include    <iostream>
#include    <limits>
#include    <climits>

template< typename T, unsigned long N = INT_MAX >
class   MaxLen
{
public:
    enum
    {
        StringLen = MaxLen< T, N / 10 >::StringLen + 1
    };
};

template< typename T >
class   MaxLen< T, 0 >
{
public:
    enum
    {
        StringLen = 1
    };
};

And you can call it from your pure-C code creating an additional C++ function like this:

extern "C"
int int_str_max( )
{
    return  MaxLen< int >::StringLen;
}

This has a ZERO execution time overhead and calculates the exact space needed.


You can test the above templates with something like:

int main( )
{
std::cout << "Max: " << std::numeric_limits< short >::max( ) << std::endl;
std::cout << "Digits: " << std::numeric_limits< short >::digits10 << std::endl;
std::cout << "A \"short\" is " << sizeof( short ) << " bytes." << std::endl
    << "A string large enough to fit any \"short\" is "
    << MaxLen< short, SHRT_MAX >::StringLen << " bytes wide." << std::endl;

std::cout << "Max: " << std::numeric_limits< int >::max( ) << std::endl;
std::cout << "Digits: " << std::numeric_limits< int >::digits10 << std::endl;
std::cout << "An \"int\" is " << sizeof( int ) << " bytes." << std::endl
    << "A string large enough to fit any \"int\" is "
    << MaxLen< int >::StringLen << " bytes wide." << std::endl;

std::cout << "Max: " << std::numeric_limits< long >::max( ) << std::endl;
std::cout << "Digits: " << std::numeric_limits< long >::digits10 << std::endl;
std::cout << "A \"long\" is " << sizeof( long ) << " bytes." << std::endl
    << "A string large enough to fit any \"long\" is "
    << MaxLen< long, LONG_MAX >::StringLen << " bytes wide." << std::endl;

    return  0;
}

The output is:

Max: 32767
Digits: 4
A "short" is 2 bytes.
A string large enough to fit any "short" is 6 bytes wide.
Max: 2147483647
Digits: 9
An "int" is 4 bytes.
A string large enough to fit any "int" is 11 bytes wide.
Max: 9223372036854775807
Digits: 18
A "long" is 8 bytes.
A string large enough to fit any "long" is 20 bytes wide.
  • Note the slightly different values from std::numeric_limits< T >::digits10 and MaxLen< T, N >::StringLen, as the former does not take into account digits if if can't reach '9'. Of course you can use it and simply add two if you don't care wasting a single byte in some cases.

EDIT:

Some may have found weird including <climits>. If you can count with C++11, you won't need it, and will earn an additional simplicity:

#include    <iostream>
#include    <limits>

template< typename T, unsigned long N = std::numeric_limits< T >::max( ) >
class   MaxLen
{
public:
    enum
    {
        StringLen = MaxLen< T, N / 10 >::StringLen + 1
    };
};

template< typename T >
class   MaxLen< T, 0 >
{
public:
    enum
    {
        StringLen = 1
    };
};

Now you can use

MaxLen< short >::StringLen

instead of

MaxLen< short, SHRT_MAX >::StringLen

Good, isn't?

JaxWR
  • 226
  • 2
  • 7
  • 1
    I think I can live with `std::numeric_limits< T >::digits10 + 2` and waste a byte. This seems simple yet fast. Thanks. – j4x May 10 '12 at 17:26
  • 4
    First off, C++ != C Secondly, that's a horrible amount of complexity to do something that can be done in both C and C++ in a relatively simple expression using sizeof(). – Greg A. Woods Nov 24 '12 at 22:27
  • 1
    _Of course you can use it and simply add two if you don't care wasting a single byte in some_ -- **why add 2 and not just 1 digit?** Is it for the sign? is it for the NULL character? Be more explicit. – haelix Oct 08 '18 at 14:14
4

The maximum number of decimal digits d of a signed or unsigned integer x of b bits matches the number of decimal digits of the number 2^b. In the case of signed numbers, an extra character must be added for the sign.

The number of decimal digits of x can be calculated as log_10(x), rounded up.

Therefore, the maximum number of decimal digits of x will be log_10(2^b) = b * log_10(2) = b * 0.301029995663981, rounded up.

If s is the size in bytes (given by the sizeof operator) of a certain type of integer used to store x, its size b in bits will be b = s * 8. So, the maximum number of decimal digits d will be (s * 8) * 0.301029995663981, rounded up. Rounding up will consist of truncating (converting to an integer), and adding 1.

Of course, all these constants will have to be added 1 to count the final 0 byte (see IntegerString in the following example).

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>

#define COMMON_LOG_OF_2 0.301029995663981

#define MAX_DECIMAL_DIGITS_UCHAR        ((unsigned) (sizeof (unsigned char     ) * 8 * COMMON_LOG_OF_2) + 1)
#define MAX_DECIMAL_DIGITS_USHORT       ((unsigned) (sizeof (unsigned short    ) * 8 * COMMON_LOG_OF_2) + 1)
#define MAX_DECIMAL_DIGITS_UINT         ((unsigned) (sizeof (unsigned int      ) * 8 * COMMON_LOG_OF_2) + 1)
#define MAX_DECIMAL_DIGITS_ULONG        ((unsigned) (sizeof (unsigned long     ) * 8 * COMMON_LOG_OF_2) + 1)
#define MAX_DECIMAL_DIGITS_ULONGLONG    ((unsigned) (sizeof (unsigned long long) * 8 * COMMON_LOG_OF_2) + 1)
#define MAX_DECIMAL_DIGITS_UINT128      ((unsigned) (sizeof (unsigned __int128 ) * 8 * COMMON_LOG_OF_2) + 1)

#define MAX_DECIMAL_DIGITS_CHAR         (1 + MAX_DECIMAL_DIGITS_UCHAR    )
#define MAX_DECIMAL_DIGITS_SHORT        (1 + MAX_DECIMAL_DIGITS_USHORT   )
#define MAX_DECIMAL_DIGITS_INT          (1 + MAX_DECIMAL_DIGITS_UINT     )
#define MAX_DECIMAL_DIGITS_LONG         (1 + MAX_DECIMAL_DIGITS_ULONG    )
#define MAX_DECIMAL_DIGITS_LONGLONG     (1 + MAX_DECIMAL_DIGITS_ULONGLONG)
#define MAX_DECIMAL_DIGITS_INT128       (1 + MAX_DECIMAL_DIGITS_UINT128  )

int main (void)
{
    char IntegerString[MAX_DECIMAL_DIGITS_INT + 1];

    printf ("MAX_DECIMAL_DIGITS_UCHAR     = %2u\n",MAX_DECIMAL_DIGITS_UCHAR    );
    printf ("MAX_DECIMAL_DIGITS_USHORT    = %2u\n",MAX_DECIMAL_DIGITS_USHORT   );
    printf ("MAX_DECIMAL_DIGITS_UINT      = %2u\n",MAX_DECIMAL_DIGITS_UINT     );
    printf ("MAX_DECIMAL_DIGITS_ULONG     = %2u\n",MAX_DECIMAL_DIGITS_ULONG    );
    printf ("MAX_DECIMAL_DIGITS_ULONGLONG = %2u\n",MAX_DECIMAL_DIGITS_ULONGLONG);
    printf ("MAX_DECIMAL_DIGITS_UINT128   = %2u\n",MAX_DECIMAL_DIGITS_UINT128  );

    printf ("MAX_DECIMAL_DIGITS_CHAR      = %2u\n",MAX_DECIMAL_DIGITS_CHAR     );
    printf ("MAX_DECIMAL_DIGITS_SHORT     = %2u\n",MAX_DECIMAL_DIGITS_SHORT    );
    printf ("MAX_DECIMAL_DIGITS_INT       = %2u\n",MAX_DECIMAL_DIGITS_INT      );
    printf ("MAX_DECIMAL_DIGITS_LONG      = %2u\n",MAX_DECIMAL_DIGITS_LONG     );
    printf ("MAX_DECIMAL_DIGITS_LONGLONG  = %2u\n",MAX_DECIMAL_DIGITS_LONGLONG );
    printf ("MAX_DECIMAL_DIGITS_INT128    = %2u\n",MAX_DECIMAL_DIGITS_INT128   );

    sprintf (IntegerString,"%d",INT_MAX);
    printf ("INT_MAX       = %d\n",INT_MAX);
    printf ("IntegerString = %s\n",IntegerString);

    sprintf (IntegerString,"%d",INT_MIN);
    printf ("INT_MIN       = %d\n",INT_MIN);
    printf ("IntegerString = %s\n",IntegerString);

    return EXIT_SUCCESS;
}

EDIT:

Unfortunately, the use of floating point may cause problems when evaluating the expressions as constants. I have modified them by multiplying by 2 ^ 11 and dividing by 2 ^ 8, so that all calculations should be performed by the preprocessor with integers:

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>

#define LOG2_x_2_11 616 // log(2) * 2^11

#define MAX_DECIMAL_DIGITS_UCHAR        (((sizeof (unsigned char     ) * LOG2_x_2_11) >> 8) + 1)
#define MAX_DECIMAL_DIGITS_USHORT       (((sizeof (unsigned short    ) * LOG2_x_2_11) >> 8) + 1)
#define MAX_DECIMAL_DIGITS_UINT         (((sizeof (unsigned int      ) * LOG2_x_2_11) >> 8) + 1)
#define MAX_DECIMAL_DIGITS_ULONG        (((sizeof (unsigned long     ) * LOG2_x_2_11) >> 8) + 1)
#define MAX_DECIMAL_DIGITS_ULONGLONG    (((sizeof (unsigned long long) * LOG2_x_2_11) >> 8) + 1)
#define MAX_DECIMAL_DIGITS_UINT128      (((sizeof (unsigned __int128 ) * LOG2_x_2_11) >> 8) + 1)

#define MAX_DECIMAL_DIGITS_CHAR     (1 + MAX_DECIMAL_DIGITS_UCHAR    )
#define MAX_DECIMAL_DIGITS_SHORT    (1 + MAX_DECIMAL_DIGITS_USHORT   )
#define MAX_DECIMAL_DIGITS_INT      (1 + MAX_DECIMAL_DIGITS_UINT     )
#define MAX_DECIMAL_DIGITS_LONG     (1 + MAX_DECIMAL_DIGITS_ULONG    )
#define MAX_DECIMAL_DIGITS_LONGLONG (1 + MAX_DECIMAL_DIGITS_ULONGLONG)
#define MAX_DECIMAL_DIGITS_INT128   (1 + MAX_DECIMAL_DIGITS_UINT128  )

int main (void)
{
    char IntegerString[MAX_DECIMAL_DIGITS_INT + 1];

    printf ("MAX_DECIMAL_DIGITS_UCHAR     = %2zu\n",MAX_DECIMAL_DIGITS_UCHAR    );
    printf ("MAX_DECIMAL_DIGITS_USHORT    = %2zu\n",MAX_DECIMAL_DIGITS_USHORT   );
    printf ("MAX_DECIMAL_DIGITS_UINT      = %2zu\n",MAX_DECIMAL_DIGITS_UINT     );
    printf ("MAX_DECIMAL_DIGITS_ULONG     = %2zu\n",MAX_DECIMAL_DIGITS_ULONG    );
    printf ("MAX_DECIMAL_DIGITS_ULONGLONG = %2zu\n",MAX_DECIMAL_DIGITS_ULONGLONG);
    printf ("MAX_DECIMAL_DIGITS_UINT128   = %2zu\n",MAX_DECIMAL_DIGITS_UINT128  );

    printf ("MAX_DECIMAL_DIGITS_CHAR      = %2zu\n",MAX_DECIMAL_DIGITS_CHAR     );
    printf ("MAX_DECIMAL_DIGITS_SHORT     = %2zu\n",MAX_DECIMAL_DIGITS_SHORT    );
    printf ("MAX_DECIMAL_DIGITS_INT       = %2zu\n",MAX_DECIMAL_DIGITS_INT      );
    printf ("MAX_DECIMAL_DIGITS_LONG      = %2zu\n",MAX_DECIMAL_DIGITS_LONG     );
    printf ("MAX_DECIMAL_DIGITS_LONGLONG  = %2zu\n",MAX_DECIMAL_DIGITS_LONGLONG );
    printf ("MAX_DECIMAL_DIGITS_INT128    = %2zu\n",MAX_DECIMAL_DIGITS_INT128   );

    sprintf (IntegerString,"%d",INT_MAX);
    printf ("INT_MAX       = %d\n",INT_MAX);
    printf ("IntegerString = %s\n",IntegerString);

    sprintf (IntegerString,"%d",INT_MIN);
    printf ("INT_MIN       = %d\n",INT_MIN);
    printf ("IntegerString = %s\n",IntegerString);

    return EXIT_SUCCESS;
}
3

After accept answer (2+ yr)

The following fraction 10/33 exactly meets the needs for unpadded int8_t, int16_t, int32_t and int128_t. Only 1 char extra for int64_t. Exact or 1 over for all integer sizes up to int362_t. Beyond that may be more that 1 over.

#include <limits.h>
#define MAX_CHAR_LEN_DECIMAL_INTEGER(type) (10*sizeof(type)*CHAR_BIT/33 + 2)
#define MAX_CHAR_SIZE_DECIMAL_INTEGER(type) (10*sizeof(type)*CHAR_BIT/33 + 3)

int get_int( void ) {
                                            //   + 1 for the \n of fgets()
  char draft[MAX_CHAR_SIZE_DECIMAL_INTEGER(long) + 1];  //**

  fgets(draft, sizeof draft, stdin);
  return strtol(draft, NULL, 10);
}

** fgets() typically works best with an additional char for the terminating '\n'.

Similar to @R.. but with a better fraction.


Recommend using generous, 2x, buffers when reading user input. Sometimes a user adds spaces, leading zeros, etc.

  char draft[2*(MAX_CHAR_SIZE_DECIMAL_INTEGER(long) + 1)];
  fgets(draft, sizeof draft, stdin);
Community
  • 1
  • 1
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
1

In C++11 and later, you can do the following:

namespace details {
    template<typename T>
    constexpr size_t max_to_string_length_impl(T value) {
        return (value >= 0 && value < 10) ? 1                            // [0..9] -> 1
            : (std::is_signed<T>::value && value < 0 && value > -10) ? 2 // [-9..-1] -> 2
            : 1 + max_to_string_length_impl(value / 10);                 // ..-10] [10.. -> recursion
    }
}

template<typename T>
constexpr size_t max_to_string_length() { 
    return std::max(
        details::max_to_string_length_impl(std::numeric_limits<T>::max()),
        details::max_to_string_length_impl(std::numeric_limits<T>::min())); 
}
Hrissan
  • 186
  • 1
  • 5
0

You can calculate the number of digits using log base 10. In my system, calculating the ceiling of log base 2 using the bit representation of the number didn't provide any significant gain in speed. The floor of log base 10 + 1 gives the number of digits, I add 2 to account for the null character and sign.

#include <limits.h>
#include <stdio.h>
#include <math.h>

int main(void){
  printf("%d %d\n", INT_MAX, (int)floor(log10(INT_MAX)) + 3);

  return 0;
}

Also note that the number of bytes of an int can be 2 or 4 and it is 2 only in old systems, so you could calculate the upper bound and use it in your program.

-2

Here's the C version:

#include <limits.h>

#define xstr(s) str(s)
#define str(s) #s
#define INT_STR_MAX sizeof(xstr(INT_MAX))

char buffer[INT_STR_MAX];

Then:

$ gcc -E -o str.cpp str.c
$ grep buffer str.cpp
char buffer[sizeof("2147483647")];

$ gcc -S -o str.S str.c
$ grep buffer str.S
    .comm   buffer,11,1
Alun
  • 541
  • 6
  • 16