Questions tagged [cub]

CUB provides state-of-the-art, reusable software components for every layer of the CUDA programming model.

CUB (CUDA UnBound) is a C++ template library of components for use on NVIDIA GPUs running CUDA.

CUB includes common data parallel operations such as prefix scan, reduction, histogram and sort. CUB's collective primitives are not bound to any particular width of parallelism or to any particular data type and can be used at device, block, warp or thread scope.

It is used in the backend of other NVIDIA libraries, most prominently Thrust and RAPIDS.

CUB is developed by NVIDIA Research and it's website and documentation is hosted at https://nvlabs.github.io/cub with the most recent source code being available on GitHub. It is also distributed with the CUDA Toolkit since at least CUDA 11.1.1 (first version where CUB documentation is linked from CUDA Tookit documentation).

48 questions

votes

3 answers

Block reduction in CUDA

I am trying to do reduction in CUDA and I am really a newbie. I am currently studying a sample code from NVIDIA. I guess I am really not sure how to set up the block size and grid size, especially when my input array is larger (512 X 512) than a…

asked Apr 08 '14 at 13:47

Ono

1,357
3
16
38

votes

4 answers

Sorting (small) arrays by key in CUDA

I'm trying to write a function that takes a block of unsorted key/value pairs such as <7, 4> <2, 8> <3, 1> <2, 2> <1, 5> <7, 1> <3, 8> <7, 2> and sorts them by key while reducing the values of pairs with the same key: <1, 5> <2, 10> <3, 9> <7,…

sorting cuda parallel-processing reduce cub

asked Jul 11 '13 at 15:19

user1743798

votes

3 answers

Cost functional calculation for global optimization in CUDA

I am trying to optimize a function (say find the minimum) with n parameters (Xn). All Xi's are bound to a certain range (for example -200 to 200) and if any parameter leaves this range, the function goes to infinity very fast. However, n can be…

optimization cuda genetic-algorithm nonlinear-optimization cub

asked Jul 03 '12 at 20:33

Violin Yanev

1,507
2
16
23

votes

1 answer

What is the usual way to use a modified C++ header-only library in my own open source project?

I want to use a modified C++ header library in my own open source project, but not sure what is the usual way to do it. For example, to use the original header library "CUB" in my project, I only need to: download CUB include the "umbrella" header…

c++ header-files cub

asked Aug 24 '20 at 06:29

Jason7525

votes

3 answers

Sorting many small arrays in CUDA

I am implementing a median filter in CUDA. For a particular pixel, I extract its neighbors corresponding to a window around the pixel, say a N x N (3 x 3) window, and now have an array of N x N elements. I do not envision using a window of more than…

sorting cuda cub

asked Mar 12 '14 at 01:00

Eagle

1,187
5
22
40

votes

2 answers

CUDA reduction of many small, unequally sized arrays

I am wondering if anyone could suggest the best approach to computing the mean / standard deviation of a large number of relatively small but differently sized arrays in CUDA? The parallel reduction example in the SDK works on a single very large…

arrays data-structures cuda parallel-processing cub

asked Nov 20 '09 at 22:47

zenna

9,006
12
73
101

votes

1 answer

Why does this CUDA reduction fail if I use 31 blocks?

The following CUDA code takes a list of labels (0, 1, 2, 3, ...) and finds the sums of the weights of these labels. To accelerate the calculation, I use shared memory so that each thread maintains its own running sum. At the end of the calculation,…

cuda cub

asked Oct 02 '20 at 22:40

Richard

56,349
34
180
251

votes

0 answers

in-place reduce sum for CUDA (CUB/Thrust)?

I have a device vector that needs to be transformed in multiple ways (e.g. creating 20 new arrays from it) and then reduce all (sum/accumulate), returning those sums in a host vector. The code is working with thrust::transform_reduce but looking at…

cuda gpgpu thrust gpu cub

asked Dec 06 '19 at 14:31

Vinicius Pavanelli

votes

1 answer

Installing CUB in nvidia nsight

I want to use CUB with NVIDIA Nsight. I looked for tutorials on the internet for doing that, but I didn't find anything, even in the official pages pf CUB. What do I need to do in order to use CUB in code I write using NVIDIA Nsight?

cuda nvidia nsight cub

asked May 24 '17 at 11:29

sara idrissi

votes

1 answer

CUB template similar to thrust

Following is a thrust code: h_in_value[7] = thrust::reduce(thrust::device, d_in1 + a - b, d_ori_rho_L1 + a); Here, the thrust::reduce takes the first and last input iterator, and thrust returns the value back to the CPU(copied to h_in_value) Can…

c++ cuda gpgpu thrust cub

asked May 11 '17 at 20:08

Ameya Wadekar

votes

1 answer

Sum reduction with CUB

According to this article, sum reduction with CUB Library should be one of the fastest way to make parallel reduction. As you can see in a code fragment below, the execution time is measure excluding first cub::DeviceReduce::Reduce(temp_storage,…

cuda cub

asked Sep 03 '15 at 16:29

physicist

votes

1 answer

CUDA Thrust sort or CUB::DeviceRadixSort

I have a pool of particles represented by an array of float4 where the w component is the particle's current lifetime in the range [0, 1]. I need to sort this array based on the lifetime of the particles in descending order so that I can keep an…

sorting cuda thrust cub

asked Apr 01 '15 at 19:49

Kinru

votes

2 answers

Residual calculation using CUDA

I have two vectors (oldvector and newvector). I need to calculate the value of the residual which is defined by the following pseudocode: residual = 0; forall i : residual += (oldvector[i] - newvector[i])^2 Currently, I am calculating this with two…

c++ cuda thrust cub

asked May 11 '14 at 20:55

aatish

votes

1 answer

CUDA cub::DeviceScan and the temp_storage_bytes parameter

I'm using cub::DeviceScan functiona and the sample code snippet has a parameter temp_storage_bytes, which it uses to allocate memory (which, incidentally, the code snippet never frees). The code snippet calls cub::DeviceScan functions with a pointer…

cuda cub

asked May 09 '14 at 04:41

user2462730

votes

2 answers

Using CUB::DeviceScan

I'm trying to do an exclusive sum reduction in CUDA. I am using the CUB library and have decided to try the CUB::DeviceReduce. However, my result is NaN, and I can't figure out why. Code is: #include #include #include…

cuda cub

asked Apr 29 '14 at 04:59

user2462730

2 3 4 Next