6

I'm trying to get SIZE_MAX in C89.

I thought of the following way to find SIZE_MAX:

const size_t SIZE_MAX = -1;

Since the standard (§6.2.1.2 ANSI C) says:

When a signed integer is converted to an unsigned integer with equal or greater size, if the value of the signed integer is nonnegative, its value is unchanged. Otherwise: if the unsigned integer has greater size, the signed integer is first promoted to the signed integer corresponding to the unsigned integer; the value is converted to unsigned by adding to it one greater than the largest number that can be represented in the unsigned integer type 28

With footnote 28:

In a two's-complement representation, there is no actual change in the bit pattern except filling the high-order bits with copies of the sign bit if the unsigned integer has greater size.

This seems like this has defined behavior, but I'm not quite sure if I understand the wording of that paragraph correctly.

Note that this question is explicitly about C89, so this doesn't answer my question because the standard has different wording.

If that doesn't work, the other way I came up with is:

size_t get_size_max() {
    static size_t max = 0;
    if (max == 0) {
        max -= 1U;
    }

    return max;
}

But I couldn't find anything about unsigned integer underflow in the standard, so I'm poking in the dark here.

FSMaxB
  • 2,280
  • 3
  • 22
  • 41
  • There is plenty about unsigned integer overflow in the standard. Essentially it uses modulo arithmetic. – Peter Jun 07 '17 at 01:22
  • @Peter: Well I didn't find it. The word underflow is used 4 times in the copy that I have and none of it is about unsigned types. Also modulo arithmetic is never mentioned, just the word "modulo". And normal modulo isn't necessarily defined for negative values. – FSMaxB Jun 07 '17 at 01:31
  • Same as https://stackoverflow.com/questions/3472311/what-is-a-portable-method-to-find-the-maximum-value-of-size-t – AnT stands with Russia Jun 07 '17 at 02:58
  • @FSMaxB - you haven't read closely enough and the standards don't spoonfeed like you are expecting. The 1999 C standard, section 6.2.5, second sentence of para 9. "A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.". That describes the mathematical notion of modulo arithmetic. I don't have the 1989 standard handy right now, but it definitely has words with the same net meaning. – Peter Jun 07 '17 at 05:07
  • `SIZE_MAX` is not necessarily the same thing as `~(size_t)0` nor `(size_t)-1`, so the assumption that you can 'determine' it is flawed from the start. `SIZE_MAX` is something set by the compiler. [See this](https://stackoverflow.com/questions/42574890/why-is-the-maximum-size-of-an-array-too-large). – Lundin Jun 07 '17 at 07:37
  • @Peter Yes, C89 has the same paragraph essentially. It's just that a negative value modulo isn't necessarily defined. Even Wikipedia says: "When either a or n is negative, the naive definition breaks down and programming languages differ in how these values are defined." And the standard doesn't explicitly specify modular arithmetic! – FSMaxB Jun 07 '17 at 07:47
  • @Lundin Determining it is not possible in the first place because it doesn't exist in C89. The goal is just to get the biggest representable value in size_t. Nothing more. – FSMaxB Jun 07 '17 at 07:53
  • @FSMaxB Which, as I just explained, cannot be determined by anyone else but the compiler. Regardless of if that constant exists in the standard library or not. – Lundin Jun 07 '17 at 08:00
  • @Lundin If I and M.M understand the standard correctly, (size_t)-1 is exactly that. The question you linked to is about actually creating objects that have a size of SIZE_MAX, which is not my question at all. – FSMaxB Jun 07 '17 at 08:25
  • @FSMaxB - The C standard trumps wikipedia. C99, Section 6.3.1.3, para 2 in discussing conversion between signed and unsigned types says "...if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type." Although it doesn't use words "modulo arithmetic", again it is describing the same concept - in a way that converting a negative integral value to `size_t` produces a value between 0 and the maximum value a `size_t` can represent. – Peter Jun 07 '17 at 12:03
  • @Peter: Please read my question. It is about C89, not C99. The C89 doesn't contain any paragraph that talks about repeatedly subtracting or adding. – FSMaxB Jun 07 '17 at 12:42
  • @FSMaxB - on current machine, I have access to the C99 standard, not C89. However, one feature of C standards is that wording of some topics changed between versions without changing the meaning. One thing I do remember however: the descriptions in all C standards of conversions to unsigned types have described the concept of modulo arithmetic, even if the wording changed with time. In fact, the answer you accepted relies on that property: `-1`, when converted to `size_t`, gives the maximum value a `size_t` can represent. It was also a feature of K&R C (i.e. it predated C89). – Peter Jun 08 '17 at 09:57

2 Answers2

11

You could use:

#ifndef SIZE_MAX
#define SIZE_MAX ((size_t)(-1))
#endif

The behaviour of converting -1 to unsigned integer type is defined under section C11 6.3.1.3 "Conversions - Signed and unsigned integers". C89 had an equivalent definition, numbered 3.2.1.2. In fact you quoted the ISO C90 definition 6.2.1.2 in your question (the difference between ANSI C89 and ISO C90 is that the sections are numbered differently).

I would not recommend using a const variable, since they cannot be used in constant expressions.


Note: This can't be used in C90 preprocessor arithmetic, which only works on integer constant expressions that contain no casts or words, so we can't use any sizeof tricks. In that case you might need a system-specific definition; there's no standard way for the preprocessor to detect a typedef.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 1
    You can't cast to `size_t` in preprocessor arithmetic, though, since `size_t` is a typedef. – Dietrich Epp Jun 07 '17 at 01:18
  • @M.M Thanks. I'm not too concerned about preprocessor arithmetic. It would be good to know how to do that though for others that might want to use it in the preprocessor step. – FSMaxB Jun 07 '17 at 01:25
  • If you happen to need it as a value, compile and run `#include #include int main(void) { printf("#define SIZE_MAX %zu\n", (size_t)(-1)); return EXIT_SUCCESS; }` and redirect the output to a dedicated header file, say `size_max.h`, that is included in the other source files. If you use a Makefile, it's actually quite easy to set up. – Nominal Animal Jun 07 '17 at 01:34
  • @NominalAnimal I think that's worth posting as an additional Answer. Although it would not work for cross-compilation , if the binary cannot execute on the build system. – M.M Jun 07 '17 at 01:35
  • @NominalAnimal Perhaps `printf("#define SIZE_MAX %zuu\n", (size_t)(-1));`, `u` added to insure value is treated as some unsigned type when it can. – chux - Reinstate Monica Jun 07 '17 at 02:38
  • @chux: I've seen `SIZE_MAX` defined with `U` suffix on ILP32 systems, and with `UL` on `LP64` systems. We could check `CHAR_BIT * sizeof (size_t)`, but does that make sense? Or just default to `UL` suffix, as it should not cause harm on any architecture? – Nominal Animal Jun 07 '17 at 02:43
  • @NominalAnimal Compiling and running your example program is not an option, because it depends on running code on the destination platform and it relies on a GCC specific printf format specifier. – FSMaxB Jun 07 '17 at 07:49
  • @FSMaxB: `%z` is C99, not a GNU extension, but you could just use `printf("%luu\n", (unsigned long)(size_t)(-1));` instead. You do not normally need the `_MAX` and similar macros to be numerical constants anyway, since you are unlikely to do preprocessor arithmetic (`#if SIZE_MAX < 5` or similar) on them, but **if** you did, you would either define them "by hand", or run a compile-time header-builder helper program to do so. I shall add an explanation to the beginning of my extended comment "answer". – Nominal Animal Jun 07 '17 at 12:08
  • @NominalAnimal: C99 doesn't work either. If I were able to use C99, I would use SIZE_MAX directly, since SIZE_MAX is defined in C99. – FSMaxB Jun 07 '17 at 12:46
  • @FSMaxB: I meant, `%z` is not a GNU extension, it is from C99. I did not initially notice the C89 in your question, and that was why I used `%z`. In C89, you can use `printf("%luu\n", (unsigned long)(size_t)(-1));` to print the value with an `u` suffix. In general, the above answer by M.M is what I too would suggest you use. My extended comment below is for those others who read your question, but cannot use this answer by M.M, because they happen to need the macro, or something similar, as a preprocessor numerical constant. – Nominal Animal Jun 07 '17 at 14:43
3

I recommend using the macro definition as described in M.M's answer.

In some cases, you might need a similar macro, but as a numerical constant, so that you can use it in preprocessor directives like #if VALUE > 42 ... #endif. I commented that in such cases, a helper program can be run at compile time, to compute and print a header file defining such constants.

Obviously, this will not work when cross-compiling to a different architecture; in that case, the header file must be provided by some other way. (For example, the project could have a subdirectory of pre-generated headers, and a list of known architectures for each, so that the user can simply copy the header file into place.)

Creating a Makefile and associated facilities for running such programs (and only if the user did not copy the header file into place), is not difficult.

First, let's say your program consists of two source files, foo.c:

#include <stdlib.h>
extern void hello(void);

int main(void)
{
    hello();
    return EXIT_SUCCESS;
}

and a bar.c:

#include <stdio.h>
#include "size_max.h"

#define  STRINGIFY_(s) #s
#define  STRINGIFY(s) STRINGIFY_(s)

void hello(void)
{
    fputs("SIZE_MAX = \"" STRINGIFY(SIZE_MAX) "\".\n", stdout);
}

The above bar.c converts the SIZE_MAX preprocessor macro to a string, and prints it. If we had #define SIZE_MAX (size_t)(-1), it would print SIZE_MAX = "(size_t)(-1)".

Note that bar.c includes file size_max.h, which we do not have. This is the header file we intend to generate using our helper program, size_max.c:

#include <stdlib.h>
#include <stdio.h>

int main(void)
{
    printf("#ifndef SIZE_MAX\n");
    printf("#define SIZE_MAX %lluU\n", (unsigned long long)(size_t)(-1));
    printf("#endif\n");
    return EXIT_SUCCESS;
}

chux noted in a comment that u suffix (for sufficiently large unsigned integer type) might be necessary. If that is not what you require, I'm sure you can modify the macro generator helper to suit your needs.

M.M noted in a comment that %z is not supported by ANSI C/ISO C90, so the above program first creates the constant using (size_t)(-1), then casts and prints it in the unsigned long long format.

Now, Makefiles can be written in an OS-agnostic manner, but I'm too lazy to do that here, so I shall use the values that work with GNU tools. To make it work on other systems, you only need to modify the values of

  • CC, to reflect the compiler you use

  • CFLAGS, to reflect your preferred compiler options

  • LD, to reflect your linker, unless the same as CC

  • LDFLAGS, if you need some linker flags (maybe -lm?)

  • RM, to reflect the command to delete unnecessary files

  • File names, if your build system requires some funky file name extension for executables

Anyway, here's the Makefile:

CC      := gcc
CFLAGS  := -Wall -O2
LD      := $(CC)
LDFLAGS := $(CFLAGS)
RM      := rm -f

# Programs to be built
PROGS   := example

# Relative path to use for executing the header generator helper program
HEADERGEN := ./headergen

# Rules that do not correspond to actual files
.PHONY: all clean headergen

# Default rule is to build all binaries
all: $(PROGS)

# Clean rule removes build files and binaries
clean:
    -$(RM) $(PROGS) $(HELPROG) *.o size_max.h

# Rule to "rebuild" size_max.h
size_max.h: size_max.c
    -@$(RM) $(HEADERGEN) size_max.h
    @$(CC) $(CFLAGS) $^ -o $(HEADERGEN)
    $(HEADERGEN) > size_max.h
    @$(RM) $(HEADERGEN)

# Rule to build object files from .c source files
%.o: %.c size_max.h
    $(CC) $(CFLAGS) -c $<

# Example binary requires foo.o and bar.o:
example: foo.o bar.o size_max.h
    $(LD) $(LDFLAGS) foo.o bar.o -o $@

Note that the indentation should use tabs, not spaces, so if you copy-paste the above, run e.g. sed -e 's|^ *|\t|' -i Makefile to fix it.

Before zipping or tarring the source tree, run make clean to remove any generated files from it.

Note the extra size_max.h in the recipe prerequisites. It tells make to ensure that size_max.h exists before it can complete the recipe.

The downside of this approach is that you cannot use $^ in link recipes to refer to all prerequisite file names. $< refers to the first prerequisite file name. If you use GNU make or a compatible make, you can use $(filter-out %.h, %^) (to list all prerequisites except for header files), though.

If all your binaries are built from a single source with the same name, you can replace the last two recipes with

# All programs are built from same name source files:
$(PROGS): %: %.c size_max.h
    $(CC) $(CFLAGS) $< $(LDFLAGS) -o $@

On my system, running

make clean all && ./example

outputs

rm -f example  *.o size_max.h
./headergen > size_max.h
gcc -Wall -O2 -c foo.c
gcc -Wall -O2 -c bar.c
gcc -Wall -O2 foo.o bar.o -o example
SIZE_MAX = "18446744073709551615U".

and running

make CC="gcc-5" CFLAGS="-Wall -std=c99 -pedantic -m32" clean all && ./example

outputs

rm -f example  *.o size_max.h
./headergen > size_max.h
gcc-5 -Wall -std=c99 -pedantic -m32 -c foo.c
gcc-5 -Wall -std=c99 -pedantic -m32 -c bar.c
gcc-5 -Wall -std=c99 -pedantic -m32 foo.o bar.o -o example
SIZE_MAX = "4294967295U".

Note that make does not detect if you change compiler options, if you edit the Makefile or use different CFLAGS= or CC= options when running make, so you do need then specify the clean target first, to ensure you start from a clean slate with the new settings in effect.

During normal editing and builds, when you don't change compilers or compiler options, there is no need to make clean between builds.

Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
  • If you are going to assume `unsigned long` (as indicated by the suffix) you may as well just do `#define SIZE_MAX ULONG_MAX`. There are C89 platforms where `size_t` is not `unsigned_long` (e.g. one of the MS-DOS memory models has 16-bit size_t and 32-bit long). – M.M Jun 07 '17 at 02:58
  • I do not see any benefit using `UL` versus `U` as the constant will become as wide as needed even without an `L` or `LL`. `UL` could make `SIZE_MAX` wider than needed, even if it is only a unicorn platform. – chux - Reinstate Monica Jun 07 '17 at 04:04
  • 1
    @M.M: No, not my intent. I was not aware of e.g. C11 6.4.4.1p5, *"The type of an integer constant is the first of the corresponding list in which its value can be represented"*, with the corresponding lists of no-suffix decimal integers only containing `int`, `long int`, and `long long int`; and `u`-suffix decimal integers containing `unsigned int`, `unsigned long int`, and `unsigned long long int`; i.e. that no-suffix decimal constants will be signed (but as large as needed), and that `u` suffix decimal constants will be unsigned but as large as needed, like chux commented to above. – Nominal Animal Jun 07 '17 at 04:17