0

This post does not have an answer to my question.

Consider this:

enum seq {VAL1, VAL2 = 1000000000, VAL3 = UINT_MAX};

int main(void)
{
    printf("%lu\n", sizeof(enum seq));
}

Here UINT_MAX is the max value for a uint32_t (4 billion or something)

Why is the size of the entire enum type appears to be only 4 bytes? This is only enough to store a single integer value.

Kaiyakha
  • 1,463
  • 1
  • 6
  • 19
  • 2
    Because it is enough for variables of `enum` to store a single integer value? – MikeCAT Dec 28 '20 at 23:35
  • @MikeCAT constant values must be stored somewhere. Are they stored apart from the `enum` type? (like in rodata or code segments) – Kaiyakha Dec 28 '20 at 23:37
  • 1
    An `enum` type is internally an integer type that is large enough to hold any possible value. The constants `VAL1` etc. are replaced at compile time with their values, similar to `#define VAL1 (0)` etc. You can even assign values to an `enum` variable that don't correspond to any of the defined constants. – Bodo Dec 28 '20 at 23:37
  • `sizeof(a type)` tells you how big a value of that type is. There's no way to get at the size of the supporting data the implementation uses for the type, because there's no use for it and your program should not have different behavior conditioned on it. – HTNW Dec 28 '20 at 23:40
  • @HTNW afaig the type size does not include the size of memory occupied by the constants in enum sequence, am I right? this is what confused me – Kaiyakha Dec 28 '20 at 23:42
  • 1
    I don't know what you mean by "the memory occupied by the constants". What constants do you refer to? Why should they occupy memory, and what exactly do you think would be stored in that memory? – Nate Eldredge Dec 28 '20 at 23:43
  • 3
    @Kaiyakha "*constant values must be stored somewhere*" That's a misconception about how constants work. They don't have to be stored "*somewhere*", this is for example why you cannot take the address of a constant, so both `&1001` and `&VAL1` are illegal. – dxiv Dec 28 '20 at 23:43
  • Finally if `VAL1` in `enum` is the same as `#define VAL1 0' then apparently the constant values are just in rodata or code segments – Kaiyakha Dec 28 '20 at 23:45
  • 1
    @Kaiyakha in your example, what do you expect `sizeof(enum seq)` to be? – Ryan Haining Dec 28 '20 at 23:46
  • @NateEldredge Isn't that obvious? There are no other constants here but those in the `enum` declaration. They have some value which must be remembered somehow, at least two of them here would require 4 bytes to be stored. – Kaiyakha Dec 28 '20 at 23:48
  • 1
    @Kaiyakha "*the constant values are just in rodata or code segments*" Again, no. There is no requirement and no guarantee that those constant values are literally "*stored*" anywhere. For example, you can write `int n = VAL1;` in the code, and the compiler may generate `xor eax, eax` in the binary. – dxiv Dec 28 '20 at 23:49
  • 1
    @RyanHaining I expected to see 12 here because standards say each constant is of `int` type and I have 3 constants in my declaration, 3 * 4 = 12 (my compiler considers `int` to be `int32_t`) – Kaiyakha Dec 28 '20 at 23:50
  • @Kaiyakha okay now we're getting somewhere. The sizeof a type is the number of bytes required to store one of that thing. The `sizeof(int)` is four bytes because one int is four bytes. Following your logic, `sizeof(int)` would need to be the number of bytes required to store *all* integers, which would be `4*UINT_MAX` – Ryan Haining Dec 28 '20 at 23:51
  • @RyanHaining I'll ask another way. Does `enum` work like a `union` here? – Kaiyakha Dec 28 '20 at 23:53
  • @RyanHaining `4*UINT_MAX` Should that be `UINT_MAX` – Ed Heal Dec 28 '20 at 23:53
  • @Kaiyakha - an `snum` is essentially an `int` in C. Just able to map names onto an `int` – Ed Heal Dec 28 '20 at 23:54
  • Finally. If `enum` is only to store a single value of `int` type, where in the memory do I find the constants defined in the `enum` declaration? Are they part of the declared type or not? – Kaiyakha Dec 28 '20 at 23:55
  • 1
    @Kaiyakha You find it in the compiler's memory while the program is being compiled. It may or may not appear in the executable file. – user3386109 Dec 28 '20 at 23:56
  • @Kaiyakha "*where in the memory do I find the constants*" The same place where you would find all values that another type (like `unsigned` for example) could possibly take. Which is to say: *nowhere*. – dxiv Dec 28 '20 at 23:58
  • Why does a `enum` or `unsigned int` or `char *` require any memory? A variable of those types in the program will require memory. With a constant, the compiler can be clever enough not to require the program to use memory to store it. It can be hard coded into the executable – Ed Heal Dec 29 '20 at 00:08
  • @EdHeal yes, that is why it is just part of the code itself. The value still requires memory to be stored, but not beside variables (static, constant or any other) – Kaiyakha Dec 29 '20 at 00:09
  • 3
    @Kaiyakha You are *still* missing the point. With an implementation where integers are 32-bit, for example, an `unsigned` can take `2^32` possible values. Quite obviously, those `4,294,967,296` different constants are not stored anywhere in memory. The exact same way, the constants defined by an `enum` are not "*stored*" anywhere by themselves. – dxiv Dec 29 '20 at 00:09
  • @dxiv probably I misunderstand the term "stored" – Kaiyakha Dec 29 '20 at 00:10
  • @dxiv I'll ask another way. Are constants in `enum` treated the same way as macros of `#define`? – Kaiyakha Dec 29 '20 at 00:13
  • 1
    @Kaiyakha Not the same, though they share similarities. More about that under [What makes a better constant in C, a macro or an enum?](https://stackoverflow.com/questions/17125505/what-makes-a-better-constant-in-c-a-macro-or-an-enum), for example. – dxiv Dec 29 '20 at 00:16
  • @Kaiyakha *does enum work like a union here* no, it works like an `int`. An `int` can store a single 4 byte value. Just like an `enum seq` can be `VAL1` or `VAL2` in your example, both of which are 4 bytes. – Ryan Haining Dec 29 '20 at 00:23
  • @EdHeal according to OP it would be the sizeof a single value, times the number of possible values – Ryan Haining Dec 29 '20 at 00:23
  • @RyanHaining - I be interest to find out what he(?) thinks about floating point numbers! – Ed Heal Dec 29 '20 at 00:46
  • @EdHeal what are you driving at? – Kaiyakha Dec 29 '20 at 07:15

2 Answers2

4

I think maybe I'm starting to understand your question.

In your example program, the numbers 0, 1000000000 and UINT_MAX do not need to be stored in the program's memory at all, since you do not use them. If for example you look at its assembly output you will not see any of those numbers. That is what the comments mean when they say they are stored "nowhere".

If you did use them, they would very likely be encoded directly into an instruction as an immediate, just as if you had used the integer literals 0 or 1000000000 or 4294967295. See for instance https://godbolt.org/z/6YKeE9. They might also be subjected to constant folding (so you wouldn't encode the number itself, only the result of whatever computation it was used in), or optimized out altogether. But they wouldn't necessarily need to be stored in data memory, unless perhaps you used them to initialize a global or static variable, as here.

And in C, sizeof(type) always gives you the amount of memory used by an object of that type. So even if you had a compiler that did need to store all of the numbers 0, 1000000000 and UINT_MAX in memory somewhere, sizeof(enum seq) would not give you the total amount of memory needed for that; it would only give you the amount of memory needed to store one object of type enum seq. Since a 4-byte unsigned integer is big enough to contain any one of the possible values of enum seq, that's the size you're getting.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • You got the point! As I said already, _they would very likely be encoded directly into an instruction_. The instruction itself goes to the `code` or `text` segment of the program's memory. So technically they are stored directly among the instructions in that very segment – Kaiyakha Dec 29 '20 at 00:16
  • 1
    @Kaiyakha: this mental representation is confusing: the enumeration values are not *stored* among the instructions. They are replaced with their value at compile time and the compiler generates code accordingly. If you wrote `return VAL1;` the compiler could generate `xor eax,eax; ret` which does not contain the enumerated value at all. – chqrlie Dec 29 '20 at 00:21
  • @chqrlie _They are replaced with their value at compile time_. Where does that replacing value come from? From the instructions I guess, which are in the `text` segment. Isn't it? – Kaiyakha Dec 29 '20 at 00:23
  • @Kaiyakha: The replacing value comes from the definition of the `enum` that the compiler parsed and stored in its own representation of the source code. None of that is *stored* into the executable's code or data (but could be encoded into debugging information records that may or may not be part of the executable file). This process is somewhat similar to the preprocessor handling of macros, but occurs at a later stage of the compilation and follows different scoping rules. – chqrlie Dec 29 '20 at 00:26
  • 1
    @Kaiyakha "where does that replacing value come from" the compiler can do anything it wants as long as it gets the correct value into the correct location. Maybe the value exists somewhere in the instructions of the binary, but it doesn't need to be. You're thinking about storage in a very unusual way. – Ryan Haining Dec 29 '20 at 00:27
  • @chqrlie still confusing, but at least you got what the question is about, thanks – Kaiyakha Dec 29 '20 at 00:28
  • @Kaiyakha: if you had a function in your program defined as `int f(int VAL1) { return VAL1;}`, this function would return the value of its argument, not the value of the enumeration constant `VAL1` (which is 0), because the identifier `VAL1` as the name of the function argument shadows the global symbol `VAL1` for the scope of the function `f`. This is very different from `#define VAL1 0` which would cause a syntax error at the function definition as `VAL1` would be replaced by the preprocessor unconditionally *before* the compiler parses the source code. – chqrlie Dec 29 '20 at 00:35
3

In C, there is no way to get the number of enumeration values in an enum, nor the maximum or minimum values. Enumerations are just a handy way to define sets of named constant values, but these values are not stored anywhere. sizeof(enum seq) is the size of the type used by the compiler to represent values the enumerated type, which is implementation specific, but must be able to represent all of the enumeration constants. In your example the compiler seems to use uint32_t for this type as all constants fit in this type, hence sizeof(enum seq) evaluates at compile time to 4.

Note however that the C Standard specifies this:

6.7.2.2 Enumeration specifiers

...

Constraints
The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an int.

Semantics
The identifiers in an enumerator list are declared as constants that have type int...

Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined, but shall be capable of representing the values of all the members of the enumeration. The enumerated type is incomplete until immediately after the } that terminates the list of enumerator declarations, and complete thereafter.

Therefore the C Standard does not allow UINT_MAX as an enumerated value1, but most compilers extend the semantics to handle larger types for enumerations in a compiler specific way. If all values fit in type unsigned int but not int, the type of the enumeration could be unsigned int, but the compiler could also use long or even long long.

Also note that you should use %zu for values of type size_t such as the value of sizeof(...), or cast the value to a specific integer type and use the appropriate conversion specification if your library does not support the C99 z conversion modifier.

#include <stdio.h>

enum seq { VAL1, VAL2 = 1000000000, VAL3 = UINT_MAX };

int main(void) {
    printf("%d\n", (int)sizeof(enum seq));  // may print 4
    return 0;
}

1) ignoring the unlikely architectures where INT_MAX == UINT_MAX.

chqrlie
  • 131,814
  • 10
  • 121
  • 189