how proprietary programs distributed as binaries do exploit the specificities of hardware they run on?

Question

I always wondered how, when you buy a video game designed for x86 architecture (for exemple), this video game behaves often better with better CPU/GPU...

As far as I understand, there is no compilation during the installation process, so the instructions in the binary should be exactly the same whatever the hardware is. Therefore even if a CPU/GPU can handle a larger set of specialized instructions, these could not be in the program, because it would make the program incompatible with other hardware.

Am I wrong on this reasoning ?

How proprietary programs distributed as binaries do exploit the specificities of hardware they run on ?

1. A better microarchitecture is faster for the *same* instruction sequence. 2. Some programs will detect instruction-set extensions at runtime, and jump to the appropriate code, or load the appropriate shared library. 3. Some code just assumes it's running on modern hardware, and `SIGILL`s on older machines. 4. x86-64 is specced to include *at least* sse/sse2. — EOF, Jan 31 '16 at 10:41

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

2

They detect the hardware capabilities at runtime(running at the target PC). The CPU features are usually detected with the CPUID-instruction at machine code level (see a raw overview here). This instruction is generally accessed via built-in compiler-intrinsics, e.g. __builtin_cpu_supports("sse3") for GCC (see Intrinsics for CPUID like informations as an example).

The GPU's capabiliites are also detected at runtime, often with the appropriate DirectX/OpenGL-APIs offered by the driver. For example, DirectX can fill a Caps Structure (Microsoft.DirectX.Direct3D) with some relevant information. Another possibility may be the usage of a database containing the capabilities of each graphics card (see DirectX capabilities on different graphics cards).

Therefore even if a CPU/GPU can handle a larger set of specialized instructions, these could not be in the program, because it would make the program incompatible with other hardware.

That's not the case. Newer instructions not supported by older hardware can surely be encoded in the executable. This will do no harm and will not crash the program as long as they won't be executed(run). Hence the CPU-Feature detection at runtime - the program will choose based on these features which code can be executed safely. Those other instructions will simply be treated like any other non-executable data - like pictures or text.

How proprietary programs distributed as binaries do exploit the specificities of hardware they run on ?

They do this by encoding the same functionality several times, each version optimized for a specific set of features/capabilities. Then, at runtime, the fastest possible version is chosen for execution and the others are ignored.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jan 31 '16 at 11:09

zx485

28,498
28
50
59

Thanks, I understand now. To go a little farther in the reasoning : can the switch not be made during the installation process, allowing to strip all the useless parts and shrink the total size on the hard disk ? Maybe it is not useful, what is typically the percentage of useless code in a video game program ? – phausy Jan 31 '16 at 11:39
3

@phausy: pages of dead code have near-zero performance cost. They never even get paged in from disk. The baseline SSE2 versions of functions aren't even wasting I-cache space, since cache lines are only 64B. So the granularity for live/dead only needs to be bigger than 64B for there to be very low overhead. The run-time checks to pick the right version of a function are also cheap: one technique is to check CPU features once at startup and set function pointers. Then each call is only an extra load from memory (and an extra entry for the indirect branch predictor, which is good). – Peter Cordes Jan 31 '16 at 12:18
3

You could do an install-time re-link or selection of whole binaries, but then you'd get less benefit from upgrading your CPU. It's also not worth it, because most routines won't have different versions available. Making multiple versions of a function and choosing at runtime requires manual work. It's fine to leave most of the code only using SSE2 as a baseline. (Or possibly SSSE3, but that excludes AMD PhenomII as the most recent CPU lacking SSSE3. Blame Intel for keeping extensions secret as late as possible to screw over their competitor, making things worse for everyone.) – Peter Cordes Jan 31 '16 at 12:22
1

This isn't what you are asking, but at a higher level like C or C++, this is a perfect example of why you would use something like a function pointer (or an array of function pointers.) Using them, after detection you can automatically call the optimized versions of the compiled code while at the same time minimizing the amount of duplication that is required. – David Hoelzer Jan 31 '16 at 13:35

score 1 · Answer 2 · answered Feb 01 '16 at 01:25

GPUs actually have widely incompatible instruction sets that no game actually ships with precompiled shaders (programs that run on the GPU). Not only do AMD's, Nvidia's and Intel's GPUs have different instruction sets, they completely redesign them for every new generation of chips. Instead games ship with partially compiled bytecode shaders, or in some cases plain source code, that are compiled at runtime by the video driver into whatever instruction set the GPU actually uses.

how proprietary programs distributed as binaries do exploit the specificities of hardware they run on?

2 Answers2