Question
When passing thread_group
type objects to a device function, is there a preference for passing by reference vs passing by value?
- Is one of them "correct"
- What are the differences for each approach
- When should each approach be preferred
Examples
Similar examples in the programming-guide and the developer blog seem to handle this differently.
Programming Guide
__device__
int sum(const thread_block& g, int *x, int n) {
// ...
g.sync()
return total;
}
Developer Blog
__device__
int sum(thread_block block, int *x, int n) {
...
block.sync();
...
return total;
}
Additional Info
The programming-guide also has this to say about constructing implicit groups:
Although you can create an implicit group anywhere in the code, it is dangerous to do so. Creating a handle for an implicit group is a collective operation—all threads in the group must participate. If the group was created in a conditional branch that not all threads reach, this can lead to deadlocks or data corruption. For this reason, it is recommended that you create a handle for the implicit group upfront (as early as possible, before any branching has occurred) and use that handle throughout the kernel. Group handles must be initialized at declaration time (there is no default constructor) for the same reason and copy-constructing them is discouraged.
Which would lead me to believe passing them by reference is preferred, but I will admit there is more than enough detail underlying the various cooperative groups that it's likely I've missed some nuance. Would passing by value be considered "copy-constructing" and therefor be discouraged?
I have not noticed any performance or result difference using either one, but I may have just not tested the correct edge case; or the "undefined-behavior" may just be working out in a way that doesn't cause a problem.