I am building a reinforcement learning model on the GPU, so I am using chainer which has the cupy backend. cupy
is intended to be a duplicate of numpy
except that it operates on the GPU.
I asked this question earlier on how to do fast bit shifting on a scalar in numpy
, and the answer was easy: I need to do my bit shifting on the actual numpy.uint64
object and not on a numpy.array
object. It would be nice if I could transfer my bit shifting methods over to cupy
to get the same speedups.
However, cupy
documentation requires that scalars be on the GPU instead of the CPU (source). This means that either I...
- represent my scalar as an array, which brings up my original problem in my linked question above, or
- push my scalar integer to the CPU, do my computations, and push it back to the GPU, which is also slow.
If I want to do hundreds of thousands of bit-shifts on a scalar value, although this would take less than a second in numpy
, this takes way too long in cupy
. How do I speed up bit shifting of scalars in cupy
?