They detect the hardware capabilities at runtime(running at the target PC). The CPU features are usually detected with the CPUID
-instruction at machine code level (see a raw overview here). This instruction is generally accessed via built-in compiler-intrinsics, e.g. __builtin_cpu_supports("sse3")
for GCC (see Intrinsics for CPUID like informations as an example).
The GPU's capabiliites are also detected at runtime, often with the appropriate DirectX/OpenGL-APIs offered by the driver. For example, DirectX can fill a Caps Structure (Microsoft.DirectX.Direct3D) with some relevant information. Another possibility may be the usage of a database containing the capabilities of each graphics card (see DirectX capabilities on different graphics cards).
Therefore even if a CPU/GPU can handle a larger set of specialized instructions, these could not be in the program, because it would make the program incompatible with other hardware.
That's not the case. Newer instructions not supported by older hardware can surely be encoded in the executable. This will do no harm and will not crash the program as long as they won't be executed(run). Hence the CPU-Feature detection at runtime - the program will choose based on these features which code can be executed safely. Those other instructions will simply be treated like any other non-executable data - like pictures or text.
How proprietary programs distributed as binaries do exploit the specificities of hardware they run on ?
They do this by encoding the same functionality several times, each version optimized for a specific set of features/capabilities. Then, at runtime, the fastest possible version is chosen for execution and the others are ignored.