20

Is there a more-or-less reliable way (not necessarily perfect) to detect the machine word size of the target architecture for which I'm compiling?

By machine word size I mean the size of the integer accumulator register (e.g. EAX on x86, RAX on x86_64 etc., not streaming extensions, segment or floating-point registers).

The standard does not seem to provide a "machine word" data type. So I'm not looking for a 100% portable way, just something that works in most common cases (Intel x86 Pentium+, ARM, MIPS, PPC - that is, register-based, contemporary commodity processors).

size_t and uintptr_t sound like good candidates (and in practice matched the register size everywhere I tested) but are of course something else and are thus not guaranteed to always do so as is already described in Is size_t the word size.

Context

Let's assume I'm implementing a hashing loop over a block of contiguous data. It is OK to have the resulting hash depend on the compiler, only speed matters.

Example: http://rextester.com/VSANH87912

Testing on Windows shows that hashing in chunks of 64 bits is faster in 64-bit mode and in 32 bits in 32-bit mode:

64-bit mode
int64: 55 ms
int32: 111 ms

32-bit mode
int64: 252 ms
int32: 158 ms
Community
  • 1
  • 1
rustyx
  • 80,671
  • 25
  • 200
  • 267
  • 2
    Possible duplicate of [Is size\_t the word size?](http://stackoverflow.com/questions/14792068/is-size-t-the-word-size) – hdl Mar 07 '16 at 12:12
  • 1
    You may still mix information from compiler and architecture which have specific macro. – Jarod42 Mar 07 '16 at 12:14
  • Sizeof(long) is reasonable. It's 32 or 64 depending on what word size you compile for. – stark Mar 07 '16 at 12:15
  • 5
    @stark sizeof(long) is 4 (32 bits) on the 64-bit Windows data model. – interjay Mar 07 '16 at 12:26
  • 3
    Re the context - the question as it is is fairly clear - how can I determine the machine word size. The answer can then be applied to a multitude of different contexts. – rustyx Mar 07 '16 at 13:45
  • 1
    Use [this](https://sourceforge.net/p/predef/wiki/Architectures/) with some macro magic of your own. – user1095108 Mar 07 '16 at 13:50
  • 1
    _"Is there a more-or-less reliable way (not necessarily perfect)"_ What do you mean by "reliable"? If `T` turned out not to be it, what should happen? – edmz Mar 07 '16 at 14:03
  • 1
    I think if you shared why you care, you could get better answers. For example, assuming the reason you care has to do with optimization, you could try various word sizes and measure the performance. – GroovyDotCom Mar 07 '16 at 14:07
  • 4
    Why would you need to know? What do you want to do with this information? If I told you an incorrect word size on some platform, how would it affect whatever you're trying to do? – n. m. could be an AI Mar 07 '16 at 14:42
  • 1
    You cannot do that. The standard does not guarantee you that the code is running on some hardware computer. You might in principle use a bunch of human slaves to run your C++ code (but that is unethical). In other words, *word size* does not always make sense. More seriously, one could imagine a *bit-addressable* computer – Basile Starynkevitch Mar 08 '16 at 10:16
  • 1
    Whatever you do, don't forget to take into account that `CHAR_BIT` might not be equal to `8`. –  Mar 08 '16 at 10:28
  • 1
    Some VLIW or DSP have weird word sizes, or some embedded processors. I don't know all of them, so I cannot show you one. BTW, on Cray1 (in early 1990), the word size was not very well defined: pointers to doubles and pointers to chars had different sizes! – Basile Starynkevitch Mar 08 '16 at 10:29
  • 1
    @rustyx *Please spare me the standard rhetoric. C/C++ is used its low-levelness. Show me a computer which does not have a register size.* Show me a computer where you can ask "How big is this register?" without already knowing enough about the architecture to identify the register you're asking about. You can't even ask this question in a way that returns an accurate answer without having enough information to know the answer. **N.B. the question already states "EAX on x86, RAX on x86_64 etc"** In other words, "How big is this specific register on this specific hardware?" It's known. – Andrew Henle Mar 08 '16 at 11:46
  • Some processors don't have `EAX`. And some weird processors don't even have any registers. – Basile Starynkevitch Jan 15 '18 at 05:52

6 Answers6

13

Because the C and C++ languages deliberately abstract away such considerations as the machine word size, it's unlikely that any method will be 100% reliable. However, there are the various int_fastXX_t types that may help you infer the size. For example, this simple C++ program:

#include <iostream>
#include <cstdint>

#define SHOW(x) std::cout << # x " = " << x << '\n'

int main()
{
    SHOW(sizeof(int_fast8_t));
    SHOW(sizeof(int_fast16_t));
    SHOW(sizeof(int_fast32_t));
    SHOW(sizeof(int_fast64_t));
}

produces this result using gcc version 5.3.1 on my 64-bit Linux machine:

sizeof(int_fast8_t) = 1
sizeof(int_fast16_t) = 8
sizeof(int_fast32_t) = 8
sizeof(int_fast64_t) = 8

This suggests that one means to discover the register size might be to look for the largest difference between a required size (e.g. 2 bytes for a 16-bit value) and the corresponding int_fastXX_t size and using the size of the int_fastXX_t as the register size.

Further results

Windows 7, gcc 4.9.3 under Cygwin on 64-bit machine: same as above

Windows 7, Visual Studio 2013 (v 12.0) on 64-bit machine:

sizeof(int_fast8_t) = 1
sizeof(int_fast16_t) = 4
sizeof(int_fast32_t) = 4
sizeof(int_fast64_t) = 8

Linux, gcc 4.6.3 on 32-bit ARM and also Linux, gcc 5.3.1 on 32-bit Atom:

sizeof(int_fast8_t) = 1
sizeof(int_fast16_t) = 4
sizeof(int_fast32_t) = 4
sizeof(int_fast64_t) = 8
Edward
  • 6,964
  • 2
  • 29
  • 55
  • Nice suggestion. Too bad this doesn't work in Windows (prints the same values on 32-bit and 64-bit platform). – rustyx Mar 07 '16 at 14:08
  • @rustyx: You're right. I've added to my answer to show that. – Edward Mar 07 '16 at 14:31
  • Your result may be the result of which compiler you used. it is possible to use a 32 bit compiler on a 64 machine. The result will run. – Robert Jacobs Mar 07 '16 at 14:40
  • 2
    @RobertJacobs: The result is **entirely** the result of which compiler is used, so it's much more of a compiler test than a CPU test. – Edward Mar 07 '16 at 14:41
  • @Edward That is why I asked your compiler settings on the Windows 7 Visual Studio 2013 test. Is the result a 32 or 64 bit executable? – Robert Jacobs Mar 07 '16 at 15:00
  • Command line used for VS: `cl /EHsc /O2 fast.cpp`. The result is a 64-bit executable. – Edward Mar 07 '16 at 15:04
  • @Edward The compiler targets a particular CPU though. – M.M Mar 08 '16 at 10:52
  • unfortunately this relies on the existence of `int_fast8/16/32/64_t` so it won't work on systems with e.g. 24-bit registers – phuclv Jan 24 '18 at 03:23
  • Would suggest warning against using `int_fast` for actually fast word sizes on systems that use `GLIBC`. They are mistuned for some popular architectures (`x86_64`/`armv8`/basically anything other than `alpha`) and unchangeable due to ABI concerns. – Noah Apr 17 '22 at 17:30
11

I think you want

sizeof(size_t) which is supposed to be the size of an index. ie. ar[index]

32 bit machine

char 1
int 4
long 4
long long 8
size_t 4

64 bit machine

char 1
int 4
long 8
long long 8
size_t 8

It may be more complicated because 32 bit compilers run on 64 bit machines. Their output 32 even though the machine is capable of more.

I added windows compilers below

Visual Studio 2012 compiled win32

char 1
int 4
long 4
long long 8
size_t 4

Visual Studio 2012 compiled x64

char 1
int 4
long 4
long long 8
size_t 8
Jaymin Panchal
  • 2,797
  • 2
  • 27
  • 31
Robert Jacobs
  • 3,266
  • 1
  • 20
  • 30
  • So which one would I use as the machine word size? – rustyx Mar 07 '16 at 14:09
  • 1
    I would go with size_t since array access is inherently a register operation. – Robert Jacobs Mar 07 '16 at 14:13
  • 5
    `size_t` is supposed to be "large enough to contain the size in bytes of any object" [support.types]/6. It has nothing to do with array access. – Pete Becker Mar 07 '16 at 15:48
  • @PeteBecker See http://en.cppreference.com/w/cpp/types/size_t. "size_t can store the maximum size of a theoretically possible object of any type (including array)". Also "size_t is commonly used for array indexing and loop counting." There is a reason malloc takes a size_t as an argument. – Robert Jacobs Mar 07 '16 at 15:54
  • 4
    @RobertJacobs - yes, `malloc` takes an argument of type `size_t`. That's because `size_t` can represent the size of any object; that's what the standard requires, which is why I quoted the standard. Array indexing is about **pointer** ranges; if you're indexing into an array of `char` your index type had better be big enough to hold the maximum size of a `char` array, even if a single register is not large enough to hold such an index value. – Pete Becker Mar 07 '16 at 16:17
  • Even with the x86-64, there's the [x32 ABI](https://en.wikipedia.org/wiki/X32_ABI). It hasn't really caught on, but it's an ILP32 convention where `size_t` is 32 bits, though it's still a 64-bit 'mode'. MIPS and SGI did this is the late 90s with the N32 ABI. – Brett Hale Jun 24 '16 at 18:28
  • 1
    As soon as you're running a 32-bit compiler on 64-bit hardware you have to ask "What is the question even asking?" and "Why?". – Persixty Jan 23 '18 at 15:03
4

Even in machine architecture a word may be multiple things. AFAIK you have different hardware related quantities:

  • character: generally speaking it is the smallest element that can be exchanged to or from memory - it is now almost everywhere 8 bits but used to be 6 on some older architectures (CDC in the early 80s)
  • integer: an integer register (e.g.EAX on a x86). IMHO an acceptable approximation is sizeof(int)
  • address: what can be addressed on the architecture. IMHO an acceptable approximation is sizeof(uintptr_t)
  • not speaking of floating points...

Let's do some history:

Machine class     |   character    |  integer    | address
-----------------------------------------------------------
old CDC           |     6 bits     |    60 bits  |  ?
8086              |     8 bits     |    16 bits  |  2x16 bits(*)
80x86 (x >= 3)    |     8 bits     |    32 bits  |  32 bits
64bits machines   |     8 bits     |    32 bits  |  64 bits    
                  |                |             |
general case(**)  |     8 bits     | sizeof(int) | sizeof(uintptr_t)

(*) it was a special addressing mode where the high word was shifted by only 8 bits to produce a 20 bits address - but far pointers used to bit 32bits long

(**) uintptr_t does not make much sense on old architecture because the compilers (when they existed) did not support that type. But if a decent compiler was ported on them, I assume that the values would be that.

But BEWARE: the types are defined by the compiler, not the architecture. That means that if you found an 8 bits compiler on a 64 machine, you would probably get sizeof(int) = 16 and sizeof(uintptr_t) = 16. So the above only make sense if you use a compiler adapted to the architecture...

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • `sizeof(int)` is not an acceptable approximation on x86_64. There it is still 4 whereas RAX is 8 bytes. – rustyx Mar 08 '16 at 10:15
  • @rustyx x86_64 is a single architecture with fixed integer accumulation register sizes. What approximation on x86_64 are you talking about? :D – jotik Mar 08 '16 at 11:02
4

I'll give you the right answer to the question you should be asking:

Q: How do I choose the fastest hash routine for a particular machine if I don't have to use a particular one and it doesn't have to be the same except within a single build (or maybe run) of an application?

A: Implement a parametrized hashing routine, possibly using a variety of primitives including SIMD instructions. On a given piece of hardware, some set of these will work and you will want to enumerate that set using some combination of compile time #ifdefs and dynamic CPU feature detection. (E.g. you can't use AVX2 on any ARM processor, determined at compile time, and you can't use it on older x86, determined by the cpuinfo instruction.) Take the set that works and time them on test data on the machines of interest. Either do so dynamically at system/application startup or test as many cases as you can and hardcode which routine to use on which system based on some sniffing algorithm. (E.g. the Linux kernel does this to determine the fastest memcpy routine, etc.)

The circumstances under which you need the hash to be consistent will be application dependent. If you need the choice to be entirely at compile time, then you'll need to craft a set of preprocessor macros the compiler defines. Often it is possible to have multiple implementations that produce the same hash but using different hardware approaches for different sizes.

Skipping SIMD is probably not a good idea if you are defining a new hash and want it to be really fast, though it may be possible in some applications to saturate the memory speed without using SIMD so it doesn't matter.

If all of that sounds like too much work, use size_t as the accumulator size. Or use the largest size for which std::atomic tells you the type is lock free. See: std::atomic_is_lock_free, std::atomic::is_lock_free, or std::atomic::is_always_lock_free.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
Zalman Stern
  • 3,161
  • 12
  • 18
3

By "machine word size" we'll have to assume that the meaning is: the largest size of a piece of data that the CPU can process in a single instruction. (Sometimes called data bus width although that's a simplicifaction.)

On various CPU:s, size_t, uintptr_t and ptrdiff_t could be anything - these are related to the address bus width, rather than the CPU data width. So we can forget about these types, they don't tell us anything.

On all mainstream CPU:s, char is always 8 bits, short is always 16 bits and long long is always 64 bits. So the only interesting types remaining are int and long.


The following mainstream CPU:s do exist:

8 bits

int   = 16 bits   
long  = 32 bits

16 bits

int   = 16 bits   
long  = 32 bits

32 bits

int   = 32 bits   
long  = 32 bits

64 bits

int   = 32 bits   
long  = 32 bits

Unconventional variations to the above may exist, but generally there's no telling from the above how to distinguish 8-bit from 16-bit or 32-bit from 64-bit.

Alignment is no help to us either, because it may or may not apply to various CPU:s. Many CPU:s can read misaligned words just fine, but at the expensive of slower code.

So there is no way to tell the "machine word size" by using standard C.


It is however possible to write fully portable C that can run on anything between 8 and 64 bits, by using the types from stdint.h, notably the uint_fast types. Some things to keep in mind are:

  • Implicit integer promotions across different systems. Anything of uint32_t or larger is generally safe and portable.
  • The default type of integer constants ("literals"). This is most often (but not always) int, and what an int is on a given system may vary.
  • Alignment and struct/union padding.
  • Pointer size is not necessarily the same as machine word size. Particularly true on many 8, 16 and 64 bit computers.
Lundin
  • 195,001
  • 40
  • 254
  • 396
0

Choose sizeof(int *) * CHAR_BIT to get machine architecture size in bits.

The reason is that architecture may be segmented, size_t gives max size for a single object (which might be what you want, but not the same thing as the machine's architecture natural bit size). If CHAR_BIT is 8 but underlying bytes not 8 bits, character and void pointers may have extra bits to allow them to address 8 bit units. int * most unlikely to have such padding. CHAR_BIT may not be 8, however.

Malcolm McLean
  • 6,258
  • 1
  • 17
  • 18
  • Great and correct answer, in my opinion. Some platforms have more than 8 bits per character, hence, CHAR_BIT is a must as you noticed. – VladP Jan 03 '18 at 21:02
  • Correct me if I'm wrong but size of pointers and size of data registers are different things (regardless `CHAR_BIT`, BTW). For example you might have (generally speaking) a 16 bit address space (addressable without segmentation...) and 8 bit registers. Just to mention something well-known you can think about 8008 (and many many other modern microcontrollers). Things may even be more complicate (8080 and successors). A modern (and explicitly mentioned by OP) PPC (with e200z7 cores) is for example a 32 bit CPU with 64 bit general purpose registers. – Adriano Repetti Jan 04 '18 at 15:27
  • 1
    The answer is incorrect since this isn't true on _any_ of the numerous 8 bit MCU:s out there: AVR, PIC, HC08, R8C, 8051, Z80... At least the former 4 are in mass production still. – Lundin Jan 23 '18 at 15:53