7

I've just started learning the WinAPI. In the MSDN, the following explanation is provided for the WORD data type.

WORD
A 16-bit unsigned integer. The range is 0 through 65535 decimal.
This type is declared in WinDef.h as follows:
typedef unsigned short WORD;

Simple enough, and it matches with the other resources I've been using for learning, but how can it can definitively said that it is 16 bits? The C data types page on Wikipedia specifies

short / short int / signed short / signed short int
Short signed integer type.
Capable of containing at least the [−32767, +32767] range; thus, it is at least 16 bits in size.

So the size of a short could very well be 32 bits according to the C Standard. But who decides what bit sizes are going to be used anyway? I found a practical explanation here. Specifically, the line:

...it depends on both processors (more specifically, ISA, instruction set architecture, e.g., x86 and x86-64) and compilers including programming model.

So it's the ISA then, which makes sense I suppose. This is where I get lost. Taking a look at the Windows page on Wikipedia, I see this in the side bar:

Platforms ARM, IA-32, Itanium, x86-64, DEC Alpha, MIPS, PowerPC

I don't really know what these are but I think these are processors, each which would have an ISA. Maybe Windows supports these platforms because all of them are guaranteed to use 16 bits for an unsigned short? This doesn't sound quite right, but I don't really know enough about this stuff to research any further.

Back to my question: How is it that the Windows API can typedef unsigned short WORD; and then say WORD is a 16-bit unsigned integer when the C Standard itself does not guarantee that a short is always 16 bits?

Community
  • 1
  • 1
codegrumps
  • 122
  • 6
  • 10
    The standard says the size of `short` is at least 16 bits. The exact size is to be decided by the implementation. Microsoft ***is*** the implementation and they chose 16. – Mysticial May 18 '16 at 19:27
  • 4
    If Microsoft defines ABI's for their platform where `short` is always 16 bit long, `short` is always 16-bit long on Microsoft platforms. It's their decision. – EOF May 18 '16 at 19:28
  • 1
    AFAIK the types `WORD` and `DWORD` predate `uint16_t` and `uint32_t`. – Weather Vane May 18 '16 at 19:44
  • Please try and stick to one question per post... you have 5 different questions in your question – M.M May 18 '16 at 20:44
  • 1
    All those questions are part of a narrative. Read the final paragraph. – David Heffernan May 18 '16 at 21:04
  • 1
    @WeatherVane: I wonder if there's any guarantee that "DWORD" and "uint32_t" will be alias-compatible? If a platform with 32-bit `int` and `long` were to use `unsigned int` for `uint32_t` and `unsigned long` for `DWORD`, modern versions of gcc would assume that a write via `DWORD*` can never modify a `uint32_t`, and a write via `uint32_t*` can never modify a `DWORD`. – supercat May 18 '16 at 21:05
  • 1
    MS is saying `WORD` is 16-bits. It's not saying `short` is of given size. – Ajay May 19 '16 at 07:50

6 Answers6

9

Simply put, a WORD is always 16 bits.

As a WORD is always 16 bits, but an unsigned short is not, a WORD is not always an unsigned short.

For every platform that the Windows SDK supports, the windows header file contains #ifdef style macros that can detect the compiler and its platform, and associate the Windows SDK defined types (WORD, DWORD, etc) to the appropriately sized platform types.

This is WHY the Windows SDK actually uses internally defined types, such as WORD, rather than using language types: so that they can ensure that their definitions are always correct.

The Windows SDK that ships with Microsoft toolchains, is possibly lazy, as Microsoft c++ toolchains always use 16bit unsigned shorts.

I would not expect the windows.h that ships with Visual Studio C++ to work correctly if dropped into GCC, clang etc. as so many details, including the mechanism of importing dll's using .iib files that the Platform SDK distribute, is a Microsoft specific implementation.


A different interpretation is that:

Microsoft says a WORD is 16 bits. If "someone" wants to call a windows API, they must pass a 16 bit value where the API defines the field as a WORD. Microsoft also possibly says, in order to build a valid windows program, using the windows header files present in their Windows SDK, the user MUST choose a compiler that has a 16bit short.

The c++ spec does not say that compilers must implement shorts as 16 bits - Microsoft says the compiler you choose to build windows executables must.

Chris Becke
  • 34,244
  • 12
  • 79
  • 148
  • @DavidHeffernan they can't, but they can document that the compiler vendor needs to ensure that the appropriate datatype gets mapped to the Windows type for each size. The Windows libraries are built under a certain set of assumptions (the ABI) that must be met by the toolchains used to develop against it. – Joe May 18 '16 at 23:32
  • @DavidHeffernan That's not the question. Thats a piece of misdirection the OP has included in the exposition around the question. But the question itself relates to the size of WORD. – Chris Becke May 19 '16 at 08:17
  • 1
    Funny these could be implemented portably now. `#include typedef uint16_t WORD;` – Joshua May 24 '19 at 18:42
8

There was originally an assumption that all code intended to run on Windows would be compiled with Microsoft's own compiler - or a fully compatible compiler. And that's the way it worked. Borland C: Matched Microsoft C. Zortech's C: Matched Microsoft C. gcc: not so much, so you didn't even try (not to mention there were no runtimes, etc.).

Over time this concept got codified and extended to other operating systems (or perhaps the other operating systems got it first) and now it is known as an ABI - Application Binary Interface - for a platform, and all compilers for that platform are assumed (in practice, required) to match the ABI. And that means matching expectations for the sizes of integral types (among other things).

An interesting related question you didn't ask is: So why is 16-bits called a word? Why is 32-bits a dword (double word) on our 32- and now 64-bit architectures where the native machine "word" size is 32- or 64-, not 16? Because: 80286.

davidbak
  • 5,775
  • 3
  • 34
  • 50
  • Some research has told me that 16 bits is called a word for historical reasons, and that it hasn't been changed because of compatibility issues. I was thrown off by that as well but it was easier for me to understand than the actual typedef statement. – codegrumps May 18 '16 at 19:47
  • 3
    @brokyle - exactly. In the future when we're running on 128-bit von Neumann machines or 8 qubit quantum machines, our _Windows code_ will still be using 16 bit WORDs and 32 bit DWORDs. Because: 80286. – davidbak May 18 '16 at 19:50
  • 1
    "word=16bits, dword=32bits, qword=64bits" is everywhere in Intel's [asm documentation and syntax](http://www.felixcloutier.com/x86/). e.g. the `pshufd` insn mnemonic (`_mm_shuffle_epi32`) is Packed (integer) Shuffle Dword. `psraw` is Packed Shift Right Arithmetic Word. (packed-FP insns use a ps or pd suffix instead of `p` prefix.) See also the [x86 tag wiki for more links](http://stackoverflow.com/tags/x86/info). The original reason for the terminology: 8086. instructions like `cbw` (sign extend al into ax) vs. `cwd` (sign extend ax into dx:ax). 386 added `cwde` (sign-extend ax into eax) – Peter Cordes May 19 '16 at 02:50
2

In the windows headers there is a lot of #define that based on the platform can ensure a WORD is 16 bit a DWORD is 32 etc etc. In some case in the past, I know they distribuite a proper SDK for each platform. In any case nothing magic, just a mixture of proper #defines and headers.

Felice Pollano
  • 32,832
  • 9
  • 75
  • 115
  • The question is how MS can be sure that short is 16 bits when the C standard does not mandate that. – David Heffernan May 18 '16 at 20:09
  • @DavidHeffernan there are a lot of questions in OP post, there is more emphasis on the question "How can MS say that WORD is 16-bit" – M.M May 18 '16 at 20:41
  • @DavidHeffernan read the title, note the question mark on the end – M.M May 18 '16 at 20:59
  • @DavidHeffernan there are 4 different questions in the body. For example, " But who decides what bit sizes are going to be used anyway?" – M.M May 18 '16 at 21:09
2

The BYTE=8bits, WORD=16bits, and DWORD=32bits (double-word) terminology comes from Intel's instruction mnemonics and documentation for 8086. It's just terminology, and at this point doesn't imply anything about the size of the "machine word" on the actual machine running the code.

My guess:

Those C type names were probably originally introduced for the same reason that C99 standardized uint8_t, uint16_t, and uint32_t. The idea was probably to allow C implementations with an incompatible ABI (e.g. 16 vs. 32-bit int) to still compile code that uses the WinAPI, because the ABI uses DWORD rather than long or int in structs, and function args / return values.

Probably as Windows evolved, enough code started depending in various ways on the exact definition of WORD and DWORD that MS decided to standardize the exact typedefs. This diverges from the C99 uint16_t idea, where you can't assume that it's unsigned short.

As @supercat points out, this can matter for aliasing rules. e.g. if you modify an array of unsigned long[] through a DWORD*, it's guaranteed that it will work as expected. But if you modify an array of unsigned int[] through a DWORD*, the compiler might assume that didn't affect array values that it already had in registers. This also matters for printf format strings. (C99's <stdint.h> solution to that is preprocessor macros like PRIu32.)

Or maybe the idea was just to use names that match the asm, to make sure nobody was confused about the width of types. In the very early days of Windows, writing programs in asm directly, instead of C, was popular. WORD/DWORD makes the documentation clearer for people writing in asm.

Or maybe the idea was just to provide a fixed-width types for portable code. e.g. #ifdef SUNOS: define it to an appropriate type for that platform. This is all it's good for at this point, as you noticed:

How is it that the Windows API can typedef unsigned short WORD; and then say WORD is a 16-bit unsigned integer when the C Standard itself does not guarantee that a short is always 16 bits?

You're correct, documenting the exact typedefs means that it's impossible to correctly implement the WinAPI headers in a system using a different ABI (e.g. one where long is 64bit or short is 32bit). This is part of the reason why the x86-64 Windows ABI makes long a 32bit type. The x86-64 System V ABI (Linux, OS X, etc.) makes long a 64bit type.

Every platform does need a standard ABI, though. struct layout, and even interpretation of function args, requires all code to agree on the size of the types used. Code from different version of the same C compiler can interoperate, and even other compilers that follow the same ABI. (However, C++ ABIs aren't stable enough to standardize. For example, g++ has never standardized an ABI, and new versions do break ABI compatibility.)

Remember that the C standard only tells you what you can assume across every conforming C implementation. The C standard also says that signed integers might be sign/magnitude, one's complement, or two's complement. Any specific platform will use whatever representation the hardware does, though.

Platforms are free to standardize anything that the base C standard leaves undefined or implementation-defined. e.g. x86 C implementations allow creating unaligned pointers to exist, and even to dereference them. This happens a lot with __m128i vector types.


The actual names chosen tie the WinAPI to its x86 heritage, and are unfortunately confusing to anyone not familiar with x86 asm, or at least Windows's 16bit DOS heritage.


The 8086 instruction mnemonics that include w for word and d for dword were commonly used as setup for idiv signed division.

These insns still exist and do exactly the same thing in 32bit and 64bit mode. (386 and x86-64 added extended versions, as you can see in those extracts from Intel's insn set reference.) There's also lodsw, rep movsw, etc. string instructions.

Besides those mnemonics, operand-size needs to be explicitly specified in some cases, e.g.
mov dword ptr [mem], -1, where neither operand is a register that can imply the operand-size. (To see what assembly language looks like, just disassemble something. e.g. on a Linux system, objdump -Mintel -d /bin/ls | less.)

So the terminology is all over the place in x86 asm, which is something you need to be familiar with when developing an ABI.


More x86 asm background, history, and current naming schemes

Nothing below this point has anything to do with WinAPI or the original question, but I thought it was interesting.


See also the tag wiki for links to Intel's official PDFs (and lots of other good stuff). This terminology is still ubiquitous in Intel and AMD documentation and instruction mnemonics, because it's completely unambiguous in a document for a specific architecture that uses it consistently.

386 extended register sizes to 32bits, and introduced the cdq instruction: cdq (eax (dword) -> edx:eax (qword)). (Also introduced movsx and movzx, to sign- or zero-extend without without needing to get the data into eax first.) Anyway, quad-word is 64bits, and was used even in pre-386 for double-precision memory operands for fld qword ptr [mem] / fst qword ptr [mem].

Intel still uses this b/w/d/q/dq convention for vector instruction naming, so it's not at all something they're trying to phase out.

e.g. the pshufd insn mnemonic (_mm_shuffle_epi32 C intrinsic) is Packed (integer) Shuffle Dword. psraw is Packed Shift Right Arithmetic Word. (FP vector insns use a ps (packed single) or pd (packed double) suffix instead of p prefix.)

As vectors get wider and wider, the naming starts to get silly: e.g. _mm_unpacklo_epi64 is the intrinsic for the punpcklqdq instruction: Packed-integer Unpack L Quad-words to Double-Quad. (i.e. interleave 64bit low halves into one 128b). Or movdqu for Move Double-Quad Unaligned loads/stores (16 bytes). Some assemblers use o (oct-word) for declaring 16 byte integer constants, but Intel mnemonics and documentation always use dq.

Fortunately for our sanity, the AVX 256b (32B) instructions still use the SSE mnemonics, so vmovdqu ymm0, [rsi] is a 32B load, but there's no quad-quad terminology. Disassemblers that include operand-sizes even when it's not ambiguous would print vmovdqu ymm0, ymmword ptr [rsi].


Even the names of some AVX-512 extensions use the b/w/d/q terminology. AVX-512F (foundation) doesn't include all element-size versions of every instruction. The 8bit and 16bit element size versions of some instructions are only available on hardware that supports the AVX-512BW extension. There's also AVX-512DQ for extra dword and qword element-size instructions, including conversion between float/double and 64bit integers and a multiply with 64b x 64b => 64b element size.


A few new instructions use numeric sizes in the mnemonic

AVX's vinsertf128 and similar for extracting the high 128bit lane of an 256bit vector could have used dq, but instead uses 128.

AVX-512 introduces a few insn mnemonics with names like vmovdqa64 (vector load with masking at 64bit element granularity) or vshuff32x4 (shuffle 128b elements, with masking at 32bit element granularity).

Note that since AVX-512 has merge-masking or zero-masking for almost all instructions, even instructions that didn't used to care about element size (like pxor / _mm_xor_si128) now come in different sizes: _mm512_mask_xor_epi64 (vpxorq) (each mask bit affects a 64bit element), or _mm512_mask_xor_epi32 (vpxord). The no-mask intrinsic _mm512_xor_si512 could compile to either vpxorq or vpxord; it doesn't matter.

Most AVX512 new instructions still use b/w/d/q in their mnemonics, though, like VPERMT2D (full permute selecting elements from two source vectors).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0

Currently there are no platforms that support Windows API but have unsigned short as not 16-bit.

If someone ever did make such a platform, the Windows API headers for that platform would not include the line typedef unsigned short WORD;.

You can think of MSDN pages as describing typical behaviour for MSVC++ on x86/x64 platforms.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • almost all compiler implementations on other platforms also have 16-bit short. – phuclv May 19 '16 at 02:21
  • 1
    *"You can think of MSDN pages as describing typical behaviour for MSVC++"* - The MSDN describes Windows' ABI. This is not strictly limited to Microsoft's compiler, but any compiler targeting that platform. – IInspectable May 19 '16 at 06:18
0

The legacy for types like WORD predates Windows back to the days of MSDOS following the types defined by MASM (later the name was changed to ML). Not adopted by Windows API are MASM's signed types, such as SBYTE, SWORD, SDWORD, SQWORD.

QWORD / SQWORD in MASM probably wasn't defined until MASM / ML supported 80386.

A current reference:

http://msdn.microsoft.com/en-us/library/8t163bt0.aspx

Windows added types such as HANDLE, WCHAR, TCHAR, ... .

For Windows / Microsoft compilers, size_t is an unsigned integer the same size as a poitner, 32 bits if in 32 bit mode, 64 bits if in 64 bit mode.

The DB and DW data directives in MASM go back to the days of Intel's 8080 assembler.

rcgldr
  • 27,407
  • 3
  • 36
  • 61