Why is the size of a native long primitive on 64-bit Windows only 4 bytes?

Question

Will someone please tell me how this makes any sense, and how to make it stop? Seriously, am I crazy or is the 64-bit Windows long type only 4 bytes? How does that make any sense? I thought the native long primitive size was supposed to be the same as the native register size.

[32-bit Linux]

me@u32:~$ ./sizes32
sizeof(char):      1
sizeof(short):     2
sizeof(int):       4
sizeof(long):      4
sizeof(long long): 8

[64-bit Linux]

me@u64:~$ ./sizes64
sizeof(char):      1
sizeof(short):     2
sizeof(int):       4
sizeof(long):      8
sizeof(long long): 8

[32-bit Windows]

C:\Users\me\Downloads>sizes32.exe
sizeof(char):      1
sizeof(short):     2
sizeof(int):       4
sizeof(long):      4
sizeof(long long): 8

[64-bit Windows]

C:\Users\me\Downloads>sizes64.exe
sizeof(char):      1
sizeof(short):     2
sizeof(int):       4
sizeof(long):      4
sizeof(long long): 8

The only size guaranteed by the standard is `sizeof(char) == 1`. — 101010, Oct 31 '15 at 00:06
"I thought the native long primitive size was supposed to be the same as the native register size" Where did you get that idea? — Baum mit Augen, Oct 31 '15 at 00:07
http://stackoverflow.com/questions/589575/what-does-the-c-standard-state-the-size-of-int-long-type-to-be — Richard Chambers, Oct 31 '15 at 00:09
@101010 in fact you don't even have this guarantee: "*The fundamental storage unit in the C++ memory model is the byte. A byte is **at least large enough** to contain any member of the basic execution character set*" — Christophe, Oct 31 '15 at 00:14
@Christophe No `sizeof(char) == 1` is guaranteed. `char` is *always* 1 byte. — Neil Kirk, Oct 31 '15 at 00:18
Doesn't this also depend largely, mostly, or possibly *entirely* on the compiler used? (And on specific compiler settings as well, such as target CPU/OS.) — Jongware, Oct 31 '15 at 00:18
@NeilKirk This corresponds indeed to the usual experience. But can you provide any reference in the standard that guarantees it ? In 3.9.1/1 there is no claim that a char would be a byte (and by the way, a byte is not necessarily an octet). — Christophe, Oct 31 '15 at 00:27
@101010: Yet there are some minimum ranges defind for the types. But whoever relies on a specific size of the integer types should not complain if his code is broken. — too honest for this site, Oct 31 '15 at 00:46
@Christophe: http://port70.net/~nsz/c/c11/n1570.html#6.5.3.4p4 — too honest for this site, Oct 31 '15 at 00:48
@nos: No. A byte is nowhere defined to be 8 bits. Actually, some decades ago, 9 bit/byte were quite common. That's why e.g. network protocols use the term "octet" to be clear (or define byte as 8 bits). — too honest for this site, Oct 31 '15 at 00:49
@nos: I wanted to state clear, that C very well uses the common definition of byte. Using byte as synonym for 8 bits is just lax. — too honest for this site, Oct 31 '15 at 00:55
@nos: Commonly is not the same as lax, but implies a majority using the phrase. If you have a look at more strict documents - note we are talking about a standard, not a norm - you will see it is not. And the more thoughtful people (or with a broader background) are also aware about this. Programming and engineering is no place for imprecise language. — too honest for this site, Oct 31 '15 at 01:00
@Olaf, no we were talking about a standard vs a commonly used term outside that standard. Which is why I mentioned the definition of a byte in C. But this discussion is already getting way out of hand. — nos, Oct 31 '15 at 01:14

phuclv · Accepted Answer · 2019-12-19T12:18:43.307

Backward compatibility!

Windows came from a 16-bit platform where sizeof(long) == 4 and it makes extensive use of custom types like LONG, DWORD... in its API. Microsoft takes a very serious stance on backward compatibility (sometimes even modifying its code to make stupid old code work) and changing that would make a lot of issues

Over on Channel 9, member Beer28 wrote, "I can't imagine there are too many problems with programs that have type widths changed." I got a good chuckle out of that and made a note to write up an entry on the Win64 data model.

The Win64 team selected the LLP64 data model, in which all integral types remain 32-bit values and only pointers expand to 64-bit values. Why?

In addition to the reasons give on that web page, another reason is that doing so avoids breaking persistence formats. For example, part of the header data for a bitmap file is defined by the following structure:
typedef struct tagBITMAPINFOHEADER {
        DWORD      biSize;
        LONG       biWidth;
        LONG       biHeight;
        WORD       biPlanes;
        WORD       biBitCount;
        DWORD      biCompression;
        DWORD      biSizeImage;
        LONG       biXPelsPerMeter;
        LONG       biYPelsPerMeter;
        DWORD      biClrUsed;
        DWORD      biClrImportant;
} BITMAPINFOHEADER, FAR *LPBITMAPINFOHEADER, *PBITMAPINFOHEADER;
If a LONG expanded from a 32-bit value to a 64-bit value, it would not be possible for a 64-bit program to use this structure to parse a bitmap file.

Why did the Win64 team choose the LLP64 model?

But Microsoft has long used their own definition `LONG`. Surely they could simply have typedef'ed or define'd their preferred "native" integral type? It would only be an issue for those using `long` instead of `LONG`. — Jongware, Oct 31 '15 at 00:26
@Jongware but people also use `long` a lot in legacy code when int was 16 bits — phuclv, Oct 31 '15 at 00:28
True, but I believe the entire switch to capitalized integral types was to prevent exactly these issues. Legacy code can then easily be repaired. I had the same problem when I switched from Borland Turbo C (which had 16-bit `int`) to more modern compilers. — Jongware, Oct 31 '15 at 00:33
@Jongware but I think MS often worry too much about legacy codes. One example is the file system redirector in WoW64 — phuclv, Oct 31 '15 at 01:23
Note that size_t is 64 bits in Windows 64 bit mode, or more accurately, as defined in Visual Studio for 64 bit Windows. I'm not sure if defining long to remain at 32 bits is related to Windows or related to the compilers. — rcgldr, Oct 31 '15 at 05:24

score 5 · Answer 2 · answered Oct 31 '15 at 00:08

5

long has to be at least 32-bits, at least as big as int and no bigger than long long. That's it. Period.

answered Oct 31 '15 at 00:08

Neil Kirk

21,327
9
53
91

1

@BaummitAugen `int` cannot be shorter than `short` which cannot be shorter than 16-bits. – Neil Kirk Oct 31 '15 at 00:12
@BaummitAugen Don't ask me, they are just the rules. – Neil Kirk Oct 31 '15 at 00:14
If you can figure out a way to way to represent the values [-32767, 32767] in less than 16 bits, then you can make a `short` less than 16 bits. – Benjamin Lindley Oct 31 '15 at 00:14
Anything in official specifications? (I realize there are lots - I guess any one will do.) – Jongware Oct 31 '15 at 00:16
1

@Jongware Yes but I am not going to find them :) – Neil Kirk Oct 31 '15 at 00:17
3

Found it; it's 5.2.4.2.1 in C99 referenced by the C++ standard. – Baum mit Augen Oct 31 '15 at 00:19

Christophe · Answer 3 · 2015-10-31T01:03:29.717

3

You got already plenty of valid responses.

Just for the records, here the precise definition in the C++ standard:

3.9.1/2: There are five standard signed integer types : “signed char”, “short int”, “int”, “long int”, and “long long int”. In this list, each type provides at least as much storage as those preceding it in the list. (...) Plain ints have the natural size suggested by the architecture of the execution environment (44).

The last sentence suggests that int is of the size corresponding to the register. Unfortunately, rather than telling the full story of the origin of the universe, its footnote just says: "(44) that is, large enough to contain any value in the range of INT_MIN and INT_MAX, as defined in the header <climits>"

edited Oct 31 '15 at 01:03

answered Oct 31 '15 at 00:34

Christophe

68,716
7
72
138

2

I'd say x86-64's "natural size" is 32bit. In 64bit-mode, the default operand size is still 32, so REX prefix byte is required to operand on 64bit registers/memory. Code density is higher when your variables are 32bit. Other than instruction-cache and fetch/decode bottleneck issues, though, using 64bit variables is just as fast. Also data-cache issues if we're talking about an array using twice as much space, instead of just local variables that mostly fit in registers. – Peter Cordes Oct 31 '15 at 21:55
1

See also [The advantages of using 32bit registers/instructions in x86-64](//stackoverflow.com/q/38303333) – Peter Cordes Dec 19 '19 at 16:01
@PeterCordes Thank you very much Peter for these interesting elements ! – Christophe Dec 19 '19 at 16:25

Why is the size of a native long primitive on 64-bit Windows only 4 bytes?

3 Answers3

Backward compatibility!

Linked