90

I suppose I am focussing on x86, but I am generally interested in the move from 32 to 64 bit.

Logically, I can see that constants and pointers, in some cases, will be larger so programs are likely to be larger. And the desire to allocate memory on word boundaries for efficiency would mean more white-space between allocations.

I have also heard that 32 bit mode on the x86 has to flush its cache when context switching due to possible overlapping 4G address spaces.

So, what are the real benefits of 64 bit?

And as a supplementary question, would 128 bit be even better?

Edit:

I have just written my first 32/64 bit program. It makes linked lists/trees of 16 byte (32b version) or 32 byte (64b version) objects and does a lot of printing to stderr - not a really useful program, and not something typical, but it is my first.

Size: 81128(32b) v 83672(64b) - so not much difference

Speed: 17s(32b) v 24s(64b) - running on 32 bit OS (OS-X 10.5.8)

Update:

I note that a new hybrid x32 ABI (Application Binary Interface) is being developed that is 64b but uses 32b pointers. For some tests it results in smaller code and faster execution than either 32b or 64b.

https://sites.google.com/site/x32abi/

philcolbourn
  • 4,042
  • 3
  • 28
  • 33
  • 1
    Seems like a duplicate of http://stackoverflow.com/questions/324015/supplying-64-bit-specific-versions-of-your-software – Suma Mar 04 '10 at 10:55
  • 1
    And mine froma few days back: http://stackoverflow.com/questions/2334148/is-there-any-real-point-compiling-a-windows-application-as-64-bit – Mr. Boy Mar 04 '10 at 11:25
  • There is some overlap I agree, but no takers on the CPU cache and 128 bit parts yet. Thanks Suma and John for the links. – philcolbourn Mar 04 '10 at 11:42
  • Take a look at http://stackoverflow.com/questions/607322/what-are-the-advantages-of-a-64-bit-processor – Sean Mar 05 '10 at 08:59
  • "I have also heard that 32 bit mode on the x86 has to flush its cache when context switching due to possible overlapping 4G address spaces." Can you please point me to a reference that talks about this? – gkb0986 Sep 05 '13 at 13:27

9 Answers9

49

I typically see a 30% speed improvement for compute-intensive code on x86-64 compared to x86. This is most likely due to the fact that we have 16 x 64 bit general purpose registers and 16 x SSE registers instead of 8 x 32 bit general purpose registers and 8 x SSE registers. This is with the Intel ICC compiler (11.1) on an x86-64 Linux - results with other compilers (e.g. gcc), or with other operating systems (e.g. Windows), may be different of course.

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • 1
    By 'compute intensive' do you mean graphics, matrix, DFTs? – philcolbourn Mar 04 '10 at 11:49
  • 4
    @phil: yes, mainly image processing, mostly integer (fixed point), lots of SIMD code, etc. – Paul R Mar 04 '10 at 12:36
  • I've observed that 64-bit compilers use the SSE registers while 32-bit compilers use the standard ALU. This makes 64-bit code faster due to the narrower FP width (64 vs 80) plus additional instructions. – IamIC Sep 29 '16 at 04:28
32

Unless you need to access more memory that 32b addressing will allow you, the benefits will be small, if any.

When running on 64b CPU, you get the same memory interface no matter if you are running 32b or 64b code (you are using the same cache and same BUS).

While x64 architecture has a few more registers which allows easier optimizations, this is often counteracted by the fact pointers are now larger and using any structures with pointers results in a higher memory traffic. I would estimate the increase in the overall memory usage for a 64b application compared to a 32b one to be around 15-30 %.

Suma
  • 33,181
  • 16
  • 123
  • 191
  • 2
    What is your view on proposed x32 ABI? – philcolbourn May 11 '12 at 11:22
  • I think memcpy and strcpy will be faster than 32 bit CPU because it will read one word every time since a word is 8 bytes on 64 bit CPU – Mark Ma May 13 '16 at 08:12
  • @MarkMa: If you use scalar integer registers for memcpy / strcpy on modern x86, you're doing it wrong. 32-bit and 64-bit code should be using 16-byte XMM or 32-byte YMM registers for a memcpy of 15 bytes or larger. (And for 8 to 15 byte copies, 32-bit code can use `movq xmm0, [edx]` / `movq [eax], xmm0` for 8-byte ops, to still use glibc's strategy of two partially-overlapping loads / stores.) That would be true on a 32-bit microcontroller without SIMD, though, like an ARM Cortex-M if its load-multiple or load-pair instruction don't allow loading 8 bytes as fast as it can load 4. – Peter Cordes Oct 06 '22 at 13:49
18

Regardless of the benefits, I would suggest that you always compile your program for the system's default word size (32-bit or 64-bit), since if you compile a library as a 32-bit binary and provide it on a 64-bit system, you will force anyone who wants to link with your library to provide their library (and any other library dependencies) as a 32-bit binary, when the 64-bit version is the default available. This can be quite a nuisance for everyone. When in doubt, provide both versions of your library.

As to the practical benefits of 64-bit... the most obvious is that you get a bigger address space, so if mmap a file, you can address more of it at once (and load larger files into memory). Another benefit is that, assuming the compiler does a good job of optimizing, many of your arithmetic operations can be parallelized (for example, placing two pairs of 32-bit numbers in two registers and performing two adds in single add operation), and big number computations will run more quickly. That said, the whole 64-bit vs 32-bit thing won't help you with asymptotic complexity at all, so if you are looking to optimize your code, you should probably be looking at the algorithms rather than the constant factors like this.

EDIT:
Please disregard my statement about the parallelized addition. This is not performed by an ordinary add statement... I was confusing that with some of the vectorized/SSE instructions. A more accurate benefit, aside from the larger address space, is that there are more general purpose registers, which means more local variables can be maintained in the CPU register file, which is much faster to access, than if you place the variables in the program stack (which usually means going out to the L1 cache).

Michael Aaron Safyan
  • 93,612
  • 16
  • 138
  • 200
  • > "for example, placing two pairs of 32-bit numbers in two registers and performing two adds in single add operation" Is there any compiler out there doing this? Also, is seems the same could be done on x86 using SSE instructions. – Suma Mar 04 '10 at 10:51
  • Thinking about such "two adds in one" more, it is a nonsense and no compiler can do it as an optimization, because addition from lower 32b could overflow into higher 32b. You need SIMD instructions for this. – Suma Mar 04 '10 at 10:54
  • I guess if you were keen you could do multiple 16 bit arithmetic in 64 bit registers. Would seem to be messy, but I bet it has been done. – philcolbourn Mar 04 '10 at 11:45
  • 'Constant Factors' - sound's like something Brian Harvey would say. – philcolbourn Mar 04 '10 at 11:47
8

I'm coding a chess engine named foolsmate. The best move extraction using a minimax-based tree search to depth 9 (from a certain position) took:

on Win32 configuration: ~17.0s;

after switching to x64 configuration: ~10.3s;

This is 41% of acceleration!

bloody
  • 1,131
  • 11
  • 17
  • 1
    Can you elaborate *why* this might be? – Shidouuu Jun 09 '21 at 19:37
  • @Shidouuu I think [Paul R's answer](https://stackoverflow.com/a/2378772/4241078) says most of it (the number of CPU registers...). My answer was intended to be a purely comparative report from testing. – bloody Oct 11 '21 at 12:13
  • 1
    Chess engines using 64-bit integers to represent bitmaps of board state are one of the special cases that most heavily favours 64-bit CPUs. A lot of real-world code only uses `long` or `size_t` for most things, so doesn't do a lot of 64-bit integer stuff on 32-bit CPUs. So they mostly "only" benefit from x86-64 having more integer and vector registers, more modern calling conventions, and guaranteed SSE2. – Peter Cordes Oct 06 '22 at 13:46
  • @PeterCordes Good point, thank you for your insight. It's still didactic to know for which purposes we may benefit most by 64-bit arch and for which to some extent. Certainly there are more applications for 64-bit arithmetic than chess but that's another story. – bloody Oct 06 '22 at 21:48
6

In addition to having more registers, 64-bit has SSE2 by default. This means that you can indeed perform some calculations in parallel. The SSE extensions had other goodies too. But I guess the main benefit is not having to check for the presence of the extensions. If it's x64, it has SSE2 available. ...If my memory serves me correctly.

amokcrow
  • 61
  • 1
  • 1
2

Only justification for moving your application to 64 bit is need for more memory in applications like large databases or ERP applications with at least 100s of concurrent users where 2 GB limit will be exceeded fairly quickly when applications cache for better performance. This is case specially on Windows OS where integer and long is still 32 bit (they have new variable _int64. Only pointers are 64 bit. In fact WOW64 is highly optimised on Windows x64 so that 32 bit applications run with low penalty on 64 bit Windows OS. My experience on Windows x64 is 32 bit application version run 10-15% faster than 64 bit since in former case at least for proprietary memory databases you can use pointer arithmatic for maintaining b-tree (most processor intensive part of database systems). Compuatation intensive applications which require large decimals for highest accuracy not afforded by double on 32-64 bit operating system. These applications can use _int64 in natively instead of software emulation. Of course large disk based databases will also show improvement over 32 bit simply due to ability to use large memory for caching query plans and so on.

GirishK
  • 29
  • 1
  • First, `int` remains 32-bit everywhere, regardless of the word size of the execution environment. For what compiler is `long` still 32-bit when compiling for 64-bit? Are you claiming that MSVC does this? AFAIK, this is even [roughly] covered in the C++11 standard: `sizeof(long) == sizeof(void*)` Please, somebody, correct me if I'm wrong, as I don't have easy access to MSVC. – Matthew Hall Dec 02 '12 at 07:01
  • 3
    @Matthew Hall:Its windows 64 bit operating system standard and therefor MSVC follows this LLP64 model (vs LP64 for Unix variants). Refer (http://msdn.microsoft.com/en-us/library/3b2e7499(v=vs.100).aspx). – GirishK Dec 02 '12 at 18:13
2

In the specific case of x68 to x68_64, the 64 bit program will be about the same size, if not slightly smaller, use a bit more memory, and run faster. Mostly this is because x86_64 doesn't just have 64 bit registers, it also has twice as many. x86 does not have enough registers to make compiled languages as efficient as they could be, so x86 code spends a lot of instructions and memory bandwidth shifting data back and forth between registers and memory. x86_64 has much less of that, and so it takes a little less space and runs faster. Floating point and bit-twiddling vector instructions are also much more efficient in x86_64.

In general, though, 64 bit code is not necessarily any faster, and is usually larger, both for code and memory usage at runtime.

Andrew McGregor
  • 31,730
  • 2
  • 29
  • 28
  • 2
    I don't quite get the point you're making. Initially (first sentence) you say that 64 bit programs will generally run faster but then your last sentence seems to be backpedalling all that to say "not really" – Motorhead Jan 07 '19 at 03:17
  • @N.S. He never said significantly faster, it could be a very small difference. – Shidouuu Jun 09 '21 at 19:42
1

Any applications that require CPU usage such as transcoding, display performance and media rendering, whether it be audio or visual, will certainly require (at this point) and benefit from using 64 bit versus 32 bit due to the CPU's ability to deal with the sheer amount of data being thrown at it. It's not so much a question of address space as it is the way the data is being dealt with. A 64 bit processor, given 64 bit code, is going to perform better, especially with mathematically difficult things like transcoding and VoIP data - in fact, any sort of 'math' applications should benefit by the usage of 64 bit CPUs and operating systems. Prove me wrong.

  • No . It wont . If the RAM requirement exceeds 4GB then only it will be faster . You can easily search 1000Millions integer array in less than 4GB of data in a 32 bit Architecture. So using 64 bit machine here will slow down – sapy Feb 11 '18 at 00:46
0

On my machine, same h265 encode works almost twice as fast using virtulDub_x64 (with x64 h265 library) vs virtulDub_x32 (regular x32 h265 library). That's probably because longint (64bits) numbers operations (ie: add) can be done on a single instruction on x64, but on 32bit needs two: add lower part, and then add (with carry) the higher part. So unless integer maths are limited to 32bit integers, most of it will take more time under x32.

isidroco
  • 3
  • 2