While the RDNA architecture is optimized for wave32, the existing
wave64 mode can be more effective for some applications. To handle
wave64 instructions, the wave controller issues and executes two wave32 instructions, each operating on half of the work-items of the
wave64 instruction. The default way to handle a wave64 instruction is
simply to issue and execute the upper and lower halves of each
instruction back-to-back – conceptually slicing every instruction
horizontally.
https://www.amd.com/system/files/documents/rdna-whitepaper.pdf
An example of application, CAS
AMD’s FidelityFX suite includes a new approach known as Contrast
Adaptive Sharpening (CAS) that uses post-processing compute shaders
to enhance image quality. CAS enhances details at the interior of an
object, while maintaining the smooth gradients created by the
antialiasing as illustrated in Figure 12. It is a full-screen compute
shader and therefore can work with any type of anti-aliasing and is
particularly effective when paired with temporal antialiasing.
CAS is extremely fast, taking just 0.15 milliseconds for a 2560x1440
frame, and benefits from a variety of features in the RDNA
architecture such as packed integer math for address calculations,
packed fp16 math for compute, faster image loads, and wave32.