CUDA - check repeated values and add two values

Question

I have two group of arrays

a1 a2 a3 a4 a5 a6 a7 a8 <= name it as key1
b1 b2 b3 b4 b5 b6 b7 b8 <= val1
c1 c2 c3 c4 c5 c6 c7 c8

and

d1 d2 d3 d4 d5 d6 d7 d8 <= key2
e1 e2 e3 e4 e5 e6 e7 e8 <= val2
f1 f2 f3 f4 f5 f6 f7 f8

The arrays a1,...,an and d1,...,dn are sorted and might be repeated. i.e. their values might be something like 1 1 2 3 4 6 7 7 7 ... I want to check if for each Tuple di,ei check if it is equal to any of ai,bi. If it is (di==ai,bi==ei) then I have to combine fi and ci using some function e.g. add and store in fi.

Firstly, is it possible to do this using zip iterators and transformation in thurst library to solve this efficiently?

Secondly, the simplest method that I can imagine is to count occurance of number of each keys (ai) do prefix sum and use both to get start and end index of each keys and then for each di use above counting to iterate through those indices and check if ei==di. and perform the transformation.

i.e. If I have

1 1 2 3 5 6 7
2 3 4 5 2 4 6
2 4 5 6 7 8 5

as first array, I count the occurance of 1,2,3,4,5,6,7,...:

2 1 1 0 1 1 1 <=name it as count

and then do prefix sum to get:

2 3 4 4 5 6 7  <= name it as cumsum

and use this to do:

for each element di,
    for i in (cumsum[di] -count[di]) to cumsum[di]:
        if ei==val1[i] then performAddition;

What I fear is that since not all threads are equal, this will lead to warp divergence, and I may not have efficient performance.

It's nearly a duplicate of [this question](http://stackoverflow.com/questions/34370201/combining-two-lists-by-key-using-thrust). Having a double key vector makes it a bit more complicated, but not much if we can guarantee that the second sub-key vector is ordered where the main key vector has duplicates. — Robert Crovella, Jul 05 '16 at 21:55
Thank you for your remark. Sadly, its not possible to guarantee the ordering. — user2178841, Jul 05 '16 at 22:04
You could impose the ordering (i.e. re-order the duplicates, so the sub-key is ordered). Otherwise you are back to brute-force searches (at least within a duplicated key) which will be significantly less efficient. — Robert Crovella, Jul 05 '16 at 22:11

kangshiyin · Answer 1 · 2016-07-06T12:04:19.987

You could treat your data as two key-value tables.Table1: (a,b) -> c and Table2: (d,e)->f, where pair (a,b) and (d,e) are keys, and c, f are values.

Then your problem simplifies to

foreach key in Table2
  if key in Table1
    Table2[key] += Table1[key]

Suppose a and b have limited ranges and are positive, such as unsigned char, a simple way to combine a and b into one key is

unsigned short key = (unsigned short)(a) * 256 + b;

If the range of key is still not too large as in the above example, you could create your Table1 as

int Table1[65536];

Checking if key in Table1 becomes

if (Table1[key] != INVALID_VALUE)
  ....

With all these restrictions, implementation with thrust should be very simple.

Similar combining method could still be used if a and b have larger range like int.

But if the range of key is too large, you have to go to the method suggested by Robert Crovella.

CUDA - check repeated values and add two values

1 Answers1