CUB provides state-of-the-art, reusable software components for every layer of the CUDA programming model.
CUB (CUDA UnBound) is a C++ template library of components for use on NVIDIA GPUs running CUDA.
CUB includes common data parallel operations such as prefix scan, reduction, histogram and sort. CUB's collective primitives are not bound to any particular width of parallelism or to any particular data type and can be used at device, block, warp or thread scope.
It is used in the backend of other NVIDIA libraries, most prominently Thrust and RAPIDS.
CUB is developed by NVIDIA Research and it's website and documentation is hosted at https://nvlabs.github.io/cub with the most recent source code being available on GitHub. It is also distributed with the CUDA Toolkit since at least CUDA 11.1.1 (first version where CUB documentation is linked from CUDA Tookit documentation).