I'm creating a kernel in C++ / CUDA in which I need to sum two 128bit numbers represented as follows:
struct Int128 {
unsigned int bit0_31;
unsigned int bit32_63;
unsigned int bit64_95;
unsigned int bit96_127;
};
Int128 sum(Int128 a, Int128 b) {
...?
}
What is the most efficient C++ way to achieve this? Thanks.
P.S. Overflow to 129bit can be ignored.