0

I am trying to port finite field CPU code over GPU and in the process, I would like to generate random vectors to test the speed of my functions.

I need two random vectors of uint64_t (and the corresponding two vectors of double, with float representation of finite field elements), each of size N. As far as I know, uint64_t types are not natively supported over GPU and are emulated using two 32-bit registers.

These vectors will contain integers in the range (0, p-1) where p is a prime number, e.g. (1<<25) - 39. (This prime uses 25 bits, but I still need 64 bits, to store intermediate results before remaindering).

I have tried to understand Curand API and generate random vectors with it.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <cuda.h>
#include <time.h>
#include <curand.h>

int main() {
  uint64_t p = (1 << 25) - 39;
  const uint32_t N = (1 << 27);
  uint64_t *au;
  double *ad;
  cudaError_t handle;

  handle = cudaMallocManaged(&au, N*sizeof(uint64_t));
  handle = cudaMallocManaged(&ad, N*sizeof(double));

  curandGenerator_t gen_type;
  curandCreateGenerator(&gen_type, CURAND_RNG_PSEUDO_MRG32K3A);
  curandSetPseudoRandomGeneratorSeed(gen_type, (uint64_t)time(NULL));

  curandGenerateLongLong(gen_type, au, p);

  cudaFree(au);
  cudaFree(ad);
  return 0;
}

Nvcc returns that au has an incompatible type in the curandGenerateLongLong call. According to the Curand API, I am constrained to use SOBOL64 quasirandom generator. Why is it so? Is there some pseudorandom generator for uint64_t or is a quasirandom generator adapted to my case?

If I want to avoid quasirandom generation, I am forced to generate randomly over the CPU and copy my random vectors to the GPU. Can I use the device curand library (curand_kernel.h) for my use case?

Dimitri Lesnoff
  • 317
  • 1
  • 14
  • 1
    You haven't defined `handle` anywhere in your posted code. Regarding the problem with `au` you are running into [this](https://stackoverflow.com/questions/32198368/unsigned-long-long-conflict-with-uint64-t). With CUDA on linux 64-bit, I consider that to be an annoyance, perhaps someone will correct me. Why not just generate `unsigned long long` instead of `uint64_t`? How does that not work for your use-case? You could just do this: `curandGenerateLongLong(gen_type, (unsigned long long *)au, p);` – Robert Crovella Jun 22 '22 at 15:14
  • Thanks, I added the handle definition. I agree with you, it does not make sense these type differences. – Dimitri Lesnoff Jun 22 '22 at 15:37

1 Answers1

5

On linux 64-bit supported by CUDA (at least) there is no numerical difference between the representation and semantics of uint64_t and unsigned long long. I acknowledge the types are different but the difference here isn't meaningful for the use case you have shown here.

It should be fine for you to modify your code as follows:

curandGenerateLongLong(gen_type, (unsigned long long *)au, p);

and you will get an array of uint64_t generated.

(on 64-bit windows, I suspect you would not even get the error you are reporting, but I have not tested it.)

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • 4
    As an addendum, to answer the titular question: Any 64-bit PRNG can be deployed in device code. `cuRand` is just a convenient library that takes care of potentially tricky parallelization and initialization issues for the programmer. Nothing precludes CUDA programmers from deploying their favorite 64-bit PRNG (I am partial to Marsaglia's KISS64, for example) in device code, and for various use cases it may not even be necessary to make sure that the random numbers consumed by each thread are guaranteed to be completely independent. – njuffa Jun 22 '22 at 17:21