0

I have problem with char type in kernel function. I want split large char type to small char type.

    __global__ void kernelExponentLoad(char* BiExponent,int lines){ 
  // BiExponent is formed from 80x100000 numbers
        const int numThreads = blockDim.x * gridDim.x;
        const int threadID = blockIdx.x * blockDim.x + threadIdx.x;
        for (int k = threadID; k < 100000; k += numThreads){
            char* cstr = new char[80];
            for(int i=0; i<80; i++){    
            cstr[i] = BiExponent[(k*80)+i];
            ...
            delete[] cstr;
            }
        }
    }

This my solution doesn't work - kernel crashes (stop work) after start. Data in "char *BiExponent" are ok (the function printf work fine).

  • 1
    You are not providing enough information on your code. You should post a fully compilable and executable code so that people willing to help you could run and check it. Prior to this, you could start testing the code by your own by adding [error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) and using `cuda-memcheck` to ward off out-of-bounds-like errors, as also mentioned at the [CUDA tag info page](http://stackoverflow.com/tags/cuda/info). – Vitality Nov 09 '13 at 23:04
  • As @JackOLantern mentioned, we are missing info to be able deduce the problem. I would expect, however, that it may be the line `cstr[i] = BiExponent[(k*80)+i];` which will read beyond the bounds of BiExponent unless you have launched N/80 total threads (where N is the number of elements in BiExponent). – MorbidFuzzball Nov 11 '13 at 19:17
  • @MorbidFuzzball that line will not read beyond the end of BiExponent assuming BiExponent has a size of 80x100000 as indicated in the comments. We can agree that k is constrained to be less than 100000 by the for-loop, right? Then 80*99999 + 79 is less than 80x100000. This is independent of the number of threads launched, except that the number of threads launched should be less than 100,000. – Robert Crovella Nov 11 '13 at 19:33
  • @RobertCrovella Yup, you're right. I spent some time after my initial comment looking at the logic, and came to the same conclusion. I think the answer you gave below must be the real source of the problem. – MorbidFuzzball Nov 11 '13 at 19:39

1 Answers1

2

The way your kernel is written in this question, your delete operator is not correctly positioned.

You are executing the delete operator on every pass of your innermost for-loop. That is not correct. Probably you want it positioned like this:

__global__ void kernelExponentLoad(char* BiExponent,int lines){ 
// BiExponent is formed from 80x100000 numbers
    const int numThreads = blockDim.x * gridDim.x;
    const int threadID = blockIdx.x * blockDim.x + threadIdx.x;
    for (int k = threadID; k < 100000; k += numThreads){
        char* cstr = new char[80];
        for(int i=0; i<80; i++){    
            cstr[i] = BiExponent[(k*80)+i];
            }
        ...
        delete[] cstr;
    }
}

Note that there are two close braces after the delete and one before, instead of all 3 after as you have shown.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257