0

I'm writing a program in CUDA that given a static matrix, it is filled with a given value, but I don't know why it gives me segfault... I think the line that gives it is when I try to copy the matrix back on the host, but I can't figure out a different way to do it.

#include <cuda.h>
#include <iostream>

using namespace std;

__global__ void initKernel(float A[][65536], int n, int m, float value){
    int i = blockDim.x*blockIdx.x + threadIdx.x;
    int x,y;
    if(i<n*m){
        x=i/m;
        y=i%m;
    }
    A[x][y]=value;
     
}

void matrixInit(float A[][65536], int n, int m, float value){

    int size=(n*m)*sizeof(float);
    int block_size = 32, number_of_blocks = ceil((n*m)/block_size);

    float (*d_A)[65536];
    cudaMalloc((void**)&d_A, size);
    cudaMemcpy(d_A, A, (n*m)*sizeof(float), cudaMemcpyHostToDevice);
    initKernel<<<number_of_blocks, block_size>>>(d_A, n,m,value);
    cudaMemcpy(A,d_A,size,cudaMemcpyDeviceToHost);
    }

int main(){
    int n=4096;
    int m=65536;
    float A[4096][65536];
    matrixInit(A,n,m,1.0);
}
  • Your stack-based matrix: `float A[4096][65536];` is too large. You could replace it with a dynamic allocation, or just move that to global scope, out of `main`. – Robert Crovella Oct 18 '21 at 17:07
  • I tried to move it outside of main but it still gives segfault. – ZeroDivisor Oct 18 '21 at 17:29
  • When I move `float A[4096][65536];` outside of main to global scope, the seg fault disappears for me. – Robert Crovella Oct 18 '21 at 17:39
  • Yeah, probably the problem was that it needed too much memory. I tried a lower dimension for the matrix and it works. However, can you please provide an example code of how to allocate a dynamic matrix in cuda and how to move it from host to device and vice-versa? I searched but it's a bit confusing and it is my first experience... – ZeroDivisor Oct 18 '21 at 17:54
  • There are probably at least dozens of those here on the `cuda` tag. [This answer](https://stackoverflow.com/questions/45643682/cuda-using-2d-and-3d-arrays/45644824#45644824) covers a variety of different methods with examples. Note that the proximal issue you have here is a host-code issue that has nothing to do with CUDA directly. To replace your host code static allocation with a dynamic allocation is purely a C++ question, not CUDA. – Robert Crovella Oct 18 '21 at 18:06

0 Answers0