How to perform relational join on two data containers on GPU (preferably CUDA)?

Question

What I'm trying to do:

On the GPU, I'm trying to mimic the conventions used by SQL in relational algebra to perform joins on tables (e.g. Inner Join, Outer Join, Cross Join). In the code below, I'm wanting to perform an Inner Join. Imagine two tables (containers) where one table is the Parent/Master table and the other is the Child table. The Parent to Child join relationship is 1 to many (or 1 to none, in the case that there is no element in Child_ParentIDs that matches an element in Parent_IDs).

Example input data:

Parent_IDs:    [1, 2,  3,  4, 5]  ... 5 elements
Parent_Values: [0, 21, 73, 0, 91] ... 5 elements
Child_ParentIDs:   [1,   1,   1,  2,   3,   5,  5]  ... 7 elements
Child_Permanences: [120, 477, 42, 106, 143, 53, 83] ... 7 elements
Child_Values:      [0,   0,   0,  0,   0,   0,  0]  ... 7 elements

Operation as an SQL query:

SELECT child.permanence * parent.value FROM child, parent WHERE child.parent_id = parent.id;

Operation description:

Join Child_ParentIDs to Parent_IDs to access the corresponding Parent_Values. Use the corresponding Parent_Values to multiply against the corresponding Child_Permanences and place the result of each operation into Child_Values.

Expected output (Child_Values is the only changed vector during the operation):

Child_ParentIDs:   [1,   1,   1,  2,    3,     5,    5]     ... 7 elements
Child_Permanences: [120, 477, 42, 106,  143,   53,   83]    ... 7 elements
Child_Values:      [0,   0,   0,  2226, 10439, 4823, 7553]  ... 7 elements

Explanation (in case it didn't make sense):

The value of 2226 is derived by multiplying 106 and 21. 10439 was from multiplying 143 and 73. Also note that ALL entries are preserved on the child vectors (all 7 elements still exist in the output, albeit with Child_Values individual elements updated). The Parent vectors are not preserved in the output (notice ParentID 4 missing from the list of vectors and there is no "dummy" placeholder for it there). This is the behavior of an "Inner Join".

Ideas of elegant solutions that I have not gotten to work:

-Utilizing CUDA's Dynamic Parallelism. Perhaps the only solution on the entire internet I have found that does exactly what I'm trying to do is here-part 1 and here-part 2.

-Using CUDPP's hashing operations;

-Alenka DB.

And finally, my question reiterated:

Is there any working solution from a purely GPU perspective (preferably with CUDA, but OpenCL would work too) for accomplishing Relational Joins on two separate containers of data so that the data can be searched and elements updated in parallel via said joins?

EDIT
Parent_IDs won't always be a sequence. During run-time it is possible for elements from the Parent vectors to be removed. Newly inserted Parent elements will always be appended with an ID that is seeded from the last element's ID. With that said, I understand this means Child elements can be orphaned but I'm not addressing the solution for that here.

TL;DR: Are you really just asking whether a off-the-shelf solution to this exists? — talonmies, Jun 14 '16 at 13:36
If you don't want to read any of this then sure, point me to an off-the-shelf solution. I'll adapt it to my needs, and update here. — aiwyn, Jun 14 '16 at 14:43
I think you can build a solution based on the solution in http://stackoverflow.com/a/34371396/5153458 — eg0x20, Jun 14 '16 at 15:08
@eg0x20 incidentally, I studied the solutions provided there yesterday :D. The second method provided by Robert Crovella in particular is excellent when the data being "joined" has no duplicate IDs (keys). His solution works there because the data in both containers is the same kind (able to be merged on sequential keys and then offset by 1 in the kernel to apply the multiplication). My needs are a bit different (notice duplicate Child_ParentIDs). — aiwyn, Jun 14 '16 at 15:18
If orphaned child appears due to parent deletion, what do you expect the result to be? Or there's only parent addition but no deletion? — kangshiyin, Jun 14 '16 at 15:29
Well I never plan on allowing orphaned children elements, I was merely saying it isn't something I've overlooked. I would resolve this by deleting all child elements that correspond to the deleted parent element when said parent element is first removed. — aiwyn, Jun 14 '16 at 15:32
@aiwyn : You accepted an answer assuming that the parent_id field in child is essentially a Join Index, i.e. you can use it (or its value minus 1) as the index into the parents table. But your edit suggests that might not be the case, and so does the title of your question. In fact, your title is even more general that that... I suggest you remove your edit, change the title, and possibly ask another question about the more general case, since this question in itself is valid. I would have a (partial) answer to your general question, since this is related to **my Post-Doc research topic**. — einpoklum, Jul 13 '16 at 15:37
@einpoklum it is my understanding that all SQL joins are reduced to index manipulations and permutations (even if only temporary indices). Therefore, the selected answer is technically correct with slight modifications; even if it does appear to answer a more specific question, it can also be applied to more general cases. If you do know of a better fully worked way then we'd love to hear it. Please update the answer or add your own. — aiwyn, Jul 14 '16 at 16:40
@aiwyn: No, that is actually not the case. Even Joins where there's a 1-to-0-or-1 relationship aren't reduced to that. It's a very simple (though common) special case. Generally for a Join you either sort or hash (among other things). — einpoklum, Jul 14 '16 at 18:45

kangshiyin · Accepted Answer · 2016-06-14T16:54:49.790

1

It looks like a simple element-wise multiplication between elements of Child_Permanences and selected elements from Parent_Values. With a few restirctions, this can be done with a single thrust::transform.

thrust::transform(
    Child_Permanences.begin(),
    Child_Permanences.end(),
    thrust::make_permutation_iterator(
        Parent_Values.begin(),
        thrust::make_transform_iterator(Child_ParentIDs.begin(),
                                        _1 - 1)),
    Child_Values.begin(),
    _1 * _2);

You may notice that Parent_IDs is not used. It is the restriction of the above code. The code assumes that Parent_IDs can be nothing but a 1-base sequence. You will find that thrust::make_transform_iterator is not required if Parent_IDs is a 0-base sequence, or Child_ParentIDs is just a parent value index as follows given your example.

Child_ParentIDs:   [0, 0, 0, 1, 2, 4, 4]

EDIT

The above code assumes that 1) there's no orphaned child; and 2) Parent_IDs is a fixed 1-based sequence like 1, 2, 3, ...

On the condition that

there's no orphaned child;
Parent_IDs is unordered and unique;
Child_ParentIDs is unrodered but not unique;

and the fact that your Parent_IDs is of the type int16, you could create a parent value index table for child element to look up, when the range of Parent_IDs is reasonably small.

Assuming the range of Parent_IDs is [1, 32767], the solution code could be

thrust::device_vector<int> Parent_index(32768, -1);
thrust::scatter(thrust::make_counting_iterator(0),
                thrust::make_counting_iterator(0) + Parent_IDs.size(),
                Parent_IDs.begin(),
                Parent_index.begin());
thrust::transform(
    Child_Permanences.begin(),
    Child_Permanences.end(),
    thrust::make_permutation_iterator(
        Parent_Values.begin(),
        thrust::make_permutation_iterator(
            Parent_index.begin(),
            Child_ParentIDs.begin())),
    Child_Values.begin(), _1 * _2);

Note that Parent_index need to be re-created each time the parent vector is modified.

edited Jun 14 '16 at 16:54

answered Jun 14 '16 at 14:19

kangshiyin

9,681
1
17
29

You are correct about Parent_IDs starting at 1 instead of 0 but the suggested implementation won't work because Parent_IDs won't always be a sequence (although, they will always be unique). Suppose a Parent element is removed at nth position from the Parent vectors. The reason I used SQL Inner Join to explain what I was looking for is because it accounts for a parent element being removed; such as in a database table delete operation. It has no 0- or 1-base sequence dependency. – aiwyn Jun 14 '16 at 14:52
The point you made about not having to use thrust::make_transform_iterator is helpful though. The only reason I opt for 1-base over 0-base in this case is because some database operations can benefit by reserving a value of 0 in the ID (key) list for inserts in which the inserted elements can be temporarily marked with an ID of 0 until the entire transaction is complete; at which point all IDs of 0 are appropriated with a sequence of ID values seeded from the last index of the table. – aiwyn Jun 14 '16 at 14:59
I think you could simplify your question to the example only, maybe some explanations. But you definitely want to add "Parent_IDs won't always be a sequence" and that a parent element may be added/deleted at run-time to your questions... – kangshiyin Jun 14 '16 at 14:59
1-base/0-base is not important compared to your two additional requirements indicated in my previous comment. For the 2 requirements, I think you need to find the parent value index before calling the code in my answer each time the parent data is modified. – kangshiyin Jun 14 '16 at 15:02
Thanks for the suggestion; I made an edit to my OP. Regarding "I think you need to find the parent value index before calling", yes that is the major problem I need help addressing. Specifically, I need to create a parent value index list that can overlay the elements in the Child vectors on a 1 to 1 basis. Any ideas on how that can be done here without a for-loop? – aiwyn Jun 14 '16 at 15:11
Thanks for the formatting help and suggestions. I'm going to go ahead and mark this as the answer because I seriously think it's as close as I'm going to get regarding the scope of this question, so great job! I'll open a new one a little later to address a secondary concern. – aiwyn Jun 14 '16 at 17:09

How to perform relational join on two data containers on GPU (preferably CUDA)?

1 Answers1