0

I am trying to accelerate encryption using the RSA algorithm using CUDA. I can't properly perform power-modulo in the kernel function.

I am using Cuda compilation tools on AWS, release 9.0, V9.0.176 to compile.

#include <cstdio>
#include <math.h>
#include "main.h"

// Kernel function to encrypt the message (m_in) elements into cipher (c_out)
__global__
void enc(int numElements, int e, int n, int *m_in, int *c_out)
{
    int index = blockIdx.x * blockDim.x + threadIdx.x;
    int stride = blockDim.x * gridDim.x;

    printf("e = %d, n = %d, numElements = %d\n", e, n, numElements);
    for (int i = index; i < numElements; i += stride)
    {
// POINT OF ERROR //
        // c_out[i] = (m_in[i]^e) % n;     //**GIVES WRONG RESULTS**
         c_out[i] = __pow(m_in[i], e) % n; //**GIVES, error: expression must have integral or enum type**
    }


}
// This function is called from main() from other file.
int* cuda_rsa(int numElements, int* data, int public_key, int key_length)
{
    int e = public_key;
    int n = key_length;

    // Allocate Unified Memory – accessible from CPU or GPU
    int* message_array;
    cudaMallocManaged(&message_array, numElements*sizeof(int));
    int* cipher_shared_array;       //Array shared by CPU and GPU
    cudaMallocManaged(&cipher_shared_array, numElements*sizeof(int));

    int* cipher_array = (int*)malloc(numElements * sizeof(int));

    //Put message array to be encrypted in a managed array
    for(int i=0; i<numElements; i++)
    {
        message_array[i] = data[i];
    }

    // Run kernel on 16M elements on the GPU
    enc<<<1, 1>>>(numElements, e, n, message_array, cipher_shared_array);

    // Wait for GPU to finish before accessing on host
    cudaDeviceSynchronize();

    //Copy into a host array and pass it to main() function for verification. 
    //Ignored memory leaks.
    for(int i=0; i<numElements; i++)
    {
        cipher_array[i] = cipher_shared_array[i];
    }
    return (cipher_array);
}

Please help me with this error. How can I implement power-modulo (as follows) on CUDA kernel?

(x ^ y) % n;

I would really appreciate any help.

onie
  • 13
  • 4

1 Answers1

1

In C or C++, this:

(x^y) 

does not raise x to the power of y. It performs a bitwise exclusive-or operation. That is why your first realization does not give the correct answer.

In C or C++, the modulo arithmetic operator:

%

is only defined for integer arguments. Even though you are passing integers to the __pow() function, the return result of that function is a double (i.e. a floating-point quantity, not an integer quantity).

I don't know the details of the math you need to perform, but if you cast the result of __pow to an int (for example) this compile error will disappear. That may or may not be valid for whatever arithmetic you wish to perform. (For example, you may wish to cast it to a "long" integer quantity.)

After you do that, you will run into another compile error. The easiest approach is to use pow() instead of __pow():

c_out[i] = (int)pow(m_in[i], e) % n;

If you were actually trying to use the CUDA fast-math intrinsic, you should use __powf not __pow:

c_out[i] = (int)__powf(m_in[i], e) % n;

Note that fast-math intrinsics generally have reduced precision.

Since these raise-to-power functions are performing floating-point arithmetic (even though you are passing integers) it is possible to get some possibly unexpected results. For example, if you raise 5 to the power of 2, its possible to get 24.9999999999 instead of 25. If you simply cast this to an integer quantity, you will get truncation to 24. Therefore you may need to explore rounding your result to the nearest integer, instead of casting. But again, I haven't studied the math you desire to perform.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257