33

Is size_t the word size of the machine that compiled the code?

Parsing with g++, my compiler views size_t as an long unsigned int. Does the compiler internally choose the size of size_t, or is size_t actually typdefed inside some pre-processor macro in stddef.h to the word size before the compiler gets invoked?

Or am I way off track?

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
gone
  • 2,587
  • 6
  • 25
  • 32
  • 4
    Don't worry. This is a valid question. – scones Feb 09 '13 at 22:02
  • What are you trying to do? – Nemo Feb 09 '13 at 22:05
  • 3
    I'm just trying to understand what it is. – gone Feb 09 '13 at 22:13
  • 2
    Then the quotes from the standards below answer your question precisely. Your compiler can use any unsigned integral type it wants for `size_t` as long as it is large enough to represent the size of any object, and that is all you can assume about it when writing C or C++ code. – Nemo Feb 09 '13 at 22:16
  • 5
    Although all these answers are correct, I didn't see anyone mention that `size_t` is *quite often* the word-size of the machine. (And by "quite often", I literally mean: almost all - as in, I've never heard of a single environment where it isn't.) – Mysticial Feb 09 '13 at 23:42
  • @Mysticial lol thank you. That clears some confusion up for sure – gone Feb 09 '13 at 23:55
  • @ZacharyO'Keefe: Careful, *word* has more than one meaning or interpretation when it comes to the topic of CPU architecture, sometimes it means the native register size, sometimes it means the addressable size between byte and int or byte and long, sometimes it means a 16-bit unit. Some CPUs have more than one native register size: x86 has gone from 8 to 16 to 32 to 64. All that I've just said is a simplification but I can guarantee you'll get answers or comments you find unhelpful because of this term. – hippietrail Feb 10 '13 at 00:52

6 Answers6

24

In the C++ standard, [support.types] (18.2) /6: "The type size_t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object."

This may or may not be the same as a "word size", whatever that means.

Pete Becker
  • 74,985
  • 8
  • 76
  • 165
  • by word size, I mean the number of bytes in an addressable piece of memory – gone Feb 09 '13 at 22:05
  • 6
    @ZacharyO'Keefe I don't think that's what you mean. A byte is by definition the smallest piece of addressable memory. – Seth Carnegie Feb 09 '13 at 22:06
  • @ZacharyO'Keefe: the correct term for the minimum addressable unit is *byte*, and in C and C++ a `char` is one byte, per definition (namely, `sizeof(char)` = 1) – Cheers and hth. - Alf Feb 09 '13 at 22:06
  • Ok. So say I have X GB of RAM => X*10^9 bytes. Then `size_t` would be typdeffed to the smallest of the unsigned integral typs, `unsigned char`, `unsigned int`, `long unsigned int` that can express integers up to, and including X*10^9 ? – gone Feb 09 '13 at 22:11
  • 2
    @ZacharyO'Keefe: The quote from the standard says nothing about "smallest" and nothing about how much memory the machine happens to have. It is large enough to hold the size of _any object_ that the implementation can allocate; that is the beginning and the end of what you can assume about it. – Nemo Feb 09 '13 at 22:13
  • @ZacharyO'Keefe No. As the answer says, it's large enough to express sizes your C++ implementation permits for objects (where "object" does not mean "instance of a `class`" but rather "a piece of memory", e.g. an array). That may be much more or much less than the address space (and how much physical memory the machine has should have little effect in *that*!). Also note that, at this level of abstraction, it's more useful and customary to express memory quantities in terms of powers in two, i.e. X GB would be X *GiB* = X*2^30 bytes. –  Feb 09 '13 at 22:13
  • @ZacharyO'Keefe - it doesn't matter how much RAM you have. The compiler is designed to handle objects up to some size, based on the compiler writer's design goals and the writer's understanding of the target platform. – Pete Becker Feb 09 '13 at 22:13
  • @SethCarnegie Ya sorry. I mean the number of bytes used to hold an address. – gone Feb 09 '13 at 22:16
  • 2
    `size_t` doesn't have to be the size of a pointer, no. It often is, but you could have a 32-bit segmented x86 model, where a pointer is 48 bits, but one "object" can still only be 32 bits, so `size_t` would be a 32-bit value. – Mats Petersson Feb 09 '13 at 22:19
  • @delnan Maybe what I'm confused about is the _any object_ part. Surely this must be limited to how much memory the machine has? Else how would this number possibly be bounded? – gone Feb 09 '13 at 22:22
  • @MatsPetersson wouldn't a 32-bit segmented x86 have 32-bit pointers? – gone Feb 09 '13 at 22:24
  • 3
    @ZacharyO'Keefe - re: "bounded" - fair question. The compiler doesn't know how much memory the system that will run the program has, so deciding what the largest possible object is doesn't depend on the amount of available memory. If there isn't enough memory to create an object, the program will fail at runtime, sometimes in mysterious ways. – Pete Becker Feb 09 '13 at 22:25
  • 1
    I said "segmented x86", meaning that each block of memory is associated with a segment, which means that pointers would have a 16-bit segment and a 32-bit "offset within segment". Which is a perfectly valid, although rather unusual, mode to run the processor in. – Mats Petersson Feb 09 '13 at 22:26
  • @PeteBecker I didn't know that. Interesting. Ok final attempt: `size_t` is defined (by.. my compiler?) to be big enough to contain the maximum possible number of bytes I can allocate to some variable? – gone Feb 09 '13 at 22:38
  • @ZacharyO'Keefe - yes, that's it. That maximum number is built in to the compiler. – Pete Becker Feb 09 '13 at 22:43
  • @ZacharyO'Keefe, I still remember (shudder!) programming the 8086, with segments of 64KiB and up to 1MiB RAM via segments. Pointers were 32 bits (16 bits segment (lowest bits unused) + 16 bits offset, largest object you could create was 64KiB, i.e., `size_t` was an `unsigned int` at 16 bits. No, it wasn't a sane setup, but it was the best that could be done. Not all architectures are so regular as we are spoiled lately. – vonbrand Feb 11 '13 at 21:19
  • @vonbrand - I was at Borland, working on compilers and runtime libraries for DOS and Windows. Those were the days. None of this wimpy flat segmentation. – Pete Becker Feb 11 '13 at 23:20
15

No; size_t is not necessarily whatever you mean by 'the word size' of the machine that will run the code (in the case of cross-compilation) or that compiled the code (in the normal case where the code will run on the same type of machine that compiled the code). It is an unsigned integer type big enough to hold the size (in bytes) of the largest object that the implementation can allocate.


Some history of sizeof and size_t

I don't know when size_t was introduced exactly, but it was between 1979 and 1989. The 1st Edition of K&R The C Programming Language from 1978 has no mention of size_t. The 7th Edition Unix Programmer's Manual has no mention of size_t at all, and that dates from 1979. The book "The UNIX Programming Environment" by Kernighan and Pike from 1984 has no mention of size_t in the index (nor of malloc() or free(), somewhat to my surprise), but that is only indicative, not conclusive. The C89 standard certainly has size_t.

The C99 Rationale documents some information about sizeof() and size_t:

6.5.3.4 The sizeof operator

It is fundamental to the correct usage of functions such as malloc and fread that sizeof(char) be exactly one. In practice, this means that a byte in C terms is the smallest unit of storage, even if this unit is 36 bits wide; and all objects are composed of an integer number of these smallest units. Also applies if memory is bit addressable. C89, like K&R, defined the result of the sizeof operator to be a constant of an unsigned integer type. Common implementations, and common usage, have often assumed that the resulting type is int. Old code that depends on this behavior has never been portable to implementations that define the result to be a type other than int. The C89 Committee did not feel it was proper to change the language to protect incorrect code.

The type of sizeof, whatever it is, is published (in the library header <stddef.h>) as size_t, since it is useful for the programmer to be able to refer to this type. This requirement implicitly restricts size_t to be a synonym for an existing unsigned integer type. Note also that, although size_t is an unsigned type, sizeof does not involve any arithmetic operations or conversions that would result in modulus behavior if the size is too large to represent as a size_t, thus quashing any notion that the largest declarable object might be too big to span even with an unsigned long in C89 or uintmax_t in C99. This also restricts the maximum number of elements that may be declared in an array, since for any array a of N elements,

N == sizeof(a)/sizeof(a[0])

Thus size_t is also a convenient type for array sizes, and is so used in several library functions. [...]

7.17 Common definitions

<stddef.h> is a header invented to provide definitions of several types and macros used widely in conjunction with the library: ptrdiff_t, size_t, wchar_t, and NULL. Including any header that references one of these macros will also define it, an exception to the usual library rule that each macro or function belongs to exactly one header.

Note that this specifically mentions that the <stddef.h> was invented by the C89 committee. I've not found words that say that size_t was also invented by the C89 committee, but if it was not, it was a codification of a fairly recent development in C.


In a comment to bmargulies answer, vonbrand says that 'it [size_t] is certainly an ANSI-C-ism'. I can very easily believe that it was an innovation with the original ANSI (ISO) C, though it is mildly odd that the rationale doesn't state that.

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    How is cross compilation related? – R.. GitHub STOP HELPING ICE Feb 09 '13 at 22:06
  • 2
    The question asks if the size of `size_t` is related to the word size of the machine on which the code was compiled. If you're cross-compiling for an 8-bit or 16-bit microprocessor on a 64-bit machine, the chances are that the size of `size_t` for the program bears no resemblance to the size of the word on the machine doing the compiling — hence my mention of cross-compiling. – Jonathan Leffler Feb 09 '13 at 22:10
  • 1
    @JonathanLeffler 'It is an unsigned integer type big enough to hold the size (in bytes) of the largest object that the implementation can allocate'. Since you can only allocate to the address space, `size_t` must be bounded by how much memory is in the address space? – gone Feb 09 '13 at 22:29
  • @ZacharyO'Keefe: Usually, you are correct, but claiming _must_ is dangerous unless you don't need to ask the question in the first place (and may be dangerous to claim even if you think you know the answer inside out and back to front). The C89 standard simply says that `` header defines a type '_`size_t` which is the unsigned integral type of the result of the `sizeof` operator_'. C99 changes 'integral' to 'integer', and C11 is the same as C99. – Jonathan Leffler Feb 09 '13 at 22:59
  • The standard is not much more informative about the `sizeof` operator; it says: '_The value of the result [of `sizeof`] is implementation-defined, and its type (an unsigned integer type) is `size_t`, defined in `` (and other headers)._' – Jonathan Leffler Feb 09 '13 at 23:00
  • @JonathanLeffler: Many compilers (e.g gcc, msvc) treat `sizeof` as a known built-in. But can `sizeof(sizeof(int))` be evaluated in the absence of `` or any other header files, by any standard? – Joseph Quinsey Feb 16 '13 at 16:25
  • Correction: I misread C99 *6.5.3.4 The sizeof and alignof operators*. `"(5) ...its type (an unsigned integer type) is size_t, defined in ."` The type of `sizeof` is *not* defined in `.` Rather, it is `size_t` which is defined, and `size_t` needs to match the otherwise-unnamed type of `sizeof` (implicitly). – Joseph Quinsey Feb 16 '13 at 20:25
3

Not necessarily. The C ISO spec (§17.1/2) defines size_t as

size_t, which is the unsigned integer type of the result of the sizeof operator

In other words, size_t has to be large enough to hold the size of any expression that could be produced from sizeof. This could be the machine word size, but it could be dramatically smaller (if, for example, the compiler limited the maximum size of arrays or objects) or dramatically larger (if the compiler were to let you create objects so huge that a single machine word could not store the size of that object).

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • Ahh. So `size_t` is set by the implementation of what my compiler allows? Since the maximum amount of memory I can allocate is clearly bounded by how much physical memory I have, is `size_t` bounded by how much physical memory I have to contribute to the address space? – gone Feb 09 '13 at 22:27
  • @ZacharyO'Keefe- It's completely up to the compiler's discretion. The compiler could very reasonably make `size_t` large enough to hold the difference of any two physical addresses (the difference of the end and start addresses of an object gives its size). On the other hand, you could imagine a weird setup where you have an 128-bit address space but an allocator that can't allocate more than 4GB, in which case the machine could use a 32-bit integer for `size_t`. The only way to know is to look at the compiler documentation. – templatetypedef Feb 09 '13 at 22:31
  • Thank you, that certainly helped. So what happens if I try to allocate (statically or dynamically) an array of 2^size_t chars? (ie, 1 more than size_t can represent). Will the compiler complain? – gone Feb 09 '13 at 22:47
  • 1
    @ZacharyO'Keefe: Statically, the compiler will probably complain. Dynamically, it will probably wrap around and you'll end up allocating 0. – icktoofay Feb 09 '13 at 22:49
  • 1
    @ZacharyO'Keefe- The argument to `malloc` (C) and `operator new` (C++) has type `size_t`, so if you try to pass in a larger value the number will overflow. – templatetypedef Feb 09 '13 at 22:50
1

size_t was, orignally, just a typedef in sys/types.h (traditionally on Unix/Linux). It was assumed to be 'big enough' for, say, the maximum size of a file, or the maximum allocation with malloc. However, over time, standard committees grabbed it, and so it wound up copied into many different header files, protected each time with its own #ifdef protection from multiple definition. On the other hand, the emergence of 64-bit systems with very big potential file sizes clouded its role. So it's a bit of a palimpset.

Language standards now call it out as living in stddef.h. It has no necessary relationship to the hardware word size, and no compiler magic. See other answers with respect to what those standards say about how big it is.

bmargulies
  • 97,814
  • 39
  • 186
  • 310
0

Such definitions are all implementation defined. I would use the sizeof(char *), or maybe sizeof(void *), if I needed a best guess size. The best this gives is the apparent word size software uses... what the hardware really has may be different (e.g., a 32-bit system may support 64-bit integers by software).

Also if you are new to the C languages see stdint.h for all sorts of material on integer sizes.

Gilbert
  • 3,740
  • 17
  • 19
0

Although the definition does not directly state what type exactly size_t is, and does not even require a minimum size, it indirectly gives some good hints. A size_t must be able to contain the size in bytes of any object, in other words, it must be able to contain the size of the largest possible object.

The largest possible object is an array (or structure) with a size equal to the entire available address space. It is not possible to reference a larger object in a meaningful manner, and apart from the availability of swap space there is no reason why it should need to be any smaller.

Therefore, by the wording of the definition, size_t must be at least 32 bits on a 32 bit architecture, and at least 64 bits on a 64 bit system. It is of course possible for an implementation to choose a larger size_t, but this is not usually the case.

Damon
  • 67,688
  • 20
  • 135
  • 185
  • 1
    §7.8.13 ¶2 sets a minimum maximum value for `size_t` at 65535. – icktoofay Feb 09 '13 at 22:40
  • 2
    Your second paragraph (and hence, everything that builds on that) assumes that all implementations must support objects as large as the entire address space. This is not the case AFAIK. Do you have a reference for that? –  Feb 09 '13 at 22:44
  • @delnan: An implementation is required to accept an array declaration with a "integral constant expression greater than zero" bound. That wording allows, in principle, a bound of _any size_, as long as it is a constant (even a number with 500 digits?). But of course, nothing bigger than the address space makes sense, since there is no meaningful way to do anything with it. You will of course in practice not be able to ever allocate such an object (program code must go somewhere, plus fragmentation, etc.), but this doesn't exclude the possibility that an object could, in theory, be that size. – Damon Feb 09 '13 at 23:12
  • Okay, but as there will be an arbitrary implementation-defined limit anyway (at least the address space of the target platform, as you note) I don't see why this limit is okay but a limit of, say, half of that wouldn't be valid. I don't think the C and C++ standards even concern themselves with concepts such as address spaces. –  Feb 09 '13 at 23:22
  • I don't think any of the standards considers something like an address space, no. It's just a practical limit that you inevitably run against. There might be a twisted clause somewhere (I'm not aware of one) that allows for some arbitrary lower limit, but I don't think this is the case for the actual size of a type. An implementation is of course never required to actually _create_ an object of any size (for whatever reason), e.g. you will usually get `bad_alloc` even for an array less than 1/4 the size of your address space, but that isn't the same thing as the type not being supported. – Damon Feb 09 '13 at 23:33
  • But if the implementation *knows* anything larger than K isn't going to work, why would it pretend to support it by making `size_t` larger? Apart from simplicity (a 32 bit integer is easier to get by than a 30 bit integer) of course. More realistically, what about a compiler targeting the x32 ABI, i.e. 64 bit code (native 64 bit integers, 64 bit registers, 64 bit instructions, etc.) but 32 bit pointers? The address space of the processor is technically 64 bit, but the implementation opts to only use 32 bit of that, so it can use a 32 bit `size_t`. Would you consider this non-conforming? –  Feb 09 '13 at 23:38
  • The second part is easier to answer: x32 is really a pure 32-bit ABI (with the CPU running in 64-bit mode), which enables registers that are normally available only in 64bit mode. You can consider this a nasty hack (a clever one, nevertheless). Whether the address space is _technically_ 64-bit doesn't matter much, because anything that's outside the 32-bit addressable range is _The Unknown Land_, so for all the application can tell, it is a 32 bit architecture (you _could_ use the extra bits in the register, but it's meaningless because it will just segfault). – Damon Feb 11 '13 at 19:27
  • The first part is a bit moot. Hardware reality has it that you can make a value (including a pointer) 8, 16, 32, or 64 bits. Insofar, it doesn't really matter whether you have 30 or 32 bits of available address space, anything exceeding 64kiB needs 32 bits. Would it be valid for an implementation to have a 30-bit `size_t`? Certainly, but the hardware won't allow for it. – Damon Feb 11 '13 at 19:31