0

I am just getting stared with cuda, and after going over the vector sum tutorials here I thought I would try something from scratch to really get my legs under me.

That said I don't know if the trouble here is a simple fix or a whole myriad of issues.

The plain English description of my code is as follows:

First there is a counterClass that has members num and count. By setting count = 0 when ever count equals num this counter class will keep track of the remainder when dividing by num as we iterate up through the integers.

I have 2 functions that I want to run in parallel. The first called count which will increment all my counters (in parallel), and the second which will check if any of the counters read 0 (in parallel) If a counter reads 0 that num divides n evenly meaning that n isn't prime.

While I would like my code to only print prime numbers, it prints all the numbers...

Here's the code:

#include <stdio.h>
#include <stdlib.h>

typedef struct{
    int num;
    int count;
} counterClass;

counterClass new_counterClass(counterClass aCounter, int by, int count){
    aCounter.num = by;
    aCounter.count = count%by;
    return aCounter;
}

__global__ void count(counterClass *Counters){
    int idx = threadIdx.x+blockDim.x*blockIdx.x;
    Counters[idx].count+=1;
    if(Counters[idx].count == Counters[idx].num){
        Counters[idx].count = 0;
    }
    __syncthreads();
}

__global__ void check(counterClass *Counters, bool *result){
    int idx = threadIdx.x+blockDim.x*blockIdx.x;
    if (Counters[idx].count == 0){
        *result = false;
    }
    __syncthreads();
}

int main(){
    int tPrimes = 5;    // Total Primes to Find
    int nPrimes = 1;    // Number of Primes Found
    bool  *d_result, h_result=true;
    counterClass *h_counters =(counterClass *)malloc(tPrimes*sizeof(counterClass));
    h_counters[0]=new_counterClass(h_counters[0], 2 , 0);
    counterClass *d_counters;
    int n = 2;
    cudaMalloc((void **)&d_counters, tPrimes*sizeof(counterClass));
    cudaMalloc((void **)&d_result, sizeof(bool));
    cudaMemcpy(d_counters, h_counters, tPrimes*sizeof(counterClass), cudaMemcpyHostToDevice);
    while(nPrimes<tPrimes){
        h_result=true;
        cudaMemcpy(d_result, &h_result, sizeof(bool), cudaMemcpyHostToDevice);
        n+=1;
        count<<<1,nPrimes>>>(d_counters);
        check<<<1,nPrimes>>>(d_counters,d_result);
        cudaMemcpy(&h_result, d_result, sizeof(bool), cudaMemcpyDeviceToHost);
        if(h_result){
            printf("%d\n", n);
            cudaMemcpy(h_counters, d_counters, tPrimes*sizeof(counterClass), cudaMemcpyDeviceToHost);
            h_counters[nPrimes]=new_counterClass(h_counters[nPrimes], n , 0);
            nPrimes += 1;
            cudaMemcpy(d_counters, h_counters, tPrimes*sizeof(counterClass), cudaMemcpyHostToDevice);
        }
    }
}

There are some similar questions CUDA - Sieve of Eratosthenes division into parts and good examples posted as questions by people seeking to improve their code , CUDA Primes Generation & Low performance in CUDA prime number generator But reading through these hasn't helped me figure out what is going wrong in my code!

Any advice on how to more effectively debug while working with CUDA would be appreciated and if you can point out what I am doing wrong (because I know it's not the computers fault) you will have my respect forever.

edit:

apparently this issue is only happening for me so perhaps it's the way I'm running my code...

$ nvcc parraPrimes.cu -o primes
$ ./primes
3
4
5
6

additionally using cuda-memCheck as recommended:

$ cuda-memcheck ./primes
========= CUDA-MEMCHECK
3
4
5
6
========= ERROR SUMMARY: 0 errors

The output from dmesg |grep NVRM is as follows::

[    3.480443] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  304.131  Sun Nov  8 21:43:33 PST 2015

Nvidia-smi is not installed on my system.

Community
  • 1
  • 1
kpie
  • 9,588
  • 5
  • 28
  • 50
  • 2
    When I run you code, it prints out 3,5,7,11 . Are you sure that you are no experiencing any runtime errors? I see no API error checking in your code. Either add it or use cuda-memcheck. – talonmies Sep 21 '16 at 07:12
  • That's incredibly strange, when I run my code it prints 3,4,5,6 – kpie Sep 21 '16 at 17:31
  • 2
    Your system configuration is broken. Normally cuda-memcheck will indicate this, but in this case, not. It will take some troubleshooting to figure out why. The next step would be to run `nvidia-smi` on that system, and see what it reports. If it seems to report that things are normal, then you will need to try proper CUDA error checking, which you can add to your code, or else run one of the CUDA sample codes such as vectorAdd. Finally, you probably want to get the output of `dmesg |grep NVRM` on that system. – Robert Crovella Sep 21 '16 at 22:45
  • 1
    If `nvidia-smi` is not installed on your system, then your system installation (GPU driver) is broken. Also, 304.131 is a "pretty old" GPU driver. What CUDA version are you using? Stated another way, what is the output of `nvcc --version` ? – Robert Crovella Sep 22 '16 at 19:24
  • Ok I had to install cuda using the deb file here -> https://developer.nvidia.com/cuda-release-candidate-download I'm not proud to admit that I bricked my computer twice trying to install from the .run file... – kpie Sep 23 '16 at 08:37

1 Answers1

0

Apt installing the nvidia-cuda-toolkit does not install cuda.

You can install cuda form nvidia's website. (*Use the .deb)

kpie
  • 9,588
  • 5
  • 28
  • 50
  • 1
    There's no particular reason you have to use the .deb. Before attempting to install CUDA, especially if you've not done it before, I would recommend reading the CUDA installation guide for the platform in question (i.e. linux in this case). Both package manager methods (.deb) and runfile installer methods are viable, but each have to be done correctly. The install guide gives the correct methodology and usage for each. – Robert Crovella Sep 23 '16 at 14:33
  • I used the install guide, the .run file turned my computer into a paperweight twice. I followed the installation guide to the best of my ability, perhaps I made a mistake, but from my experience I would recommend that anyone installing CUDA on ubuntu 16.04 use the deb not the .run. Additionally I have read that the .run file is out-of-scope for the automatic updates, meaning that installing the with the .deb takes less upkeep. – kpie Sep 24 '16 at 15:31