0
/* Include Libs */
  #include <cuda.h>
  #include <cuda_runtime.h>
  #include <cuda_runtime_api.h>
  #include <stdio.h> 
  #include <stdlib.h>
  #include <iostream>


int main(void)
{

int a=1;
int c=2;
int *deva, *devc;

cudaMalloc( (void**)&deva, sizeof(int) ); 
cudaMalloc( (void**)&devc, sizeof(int) );

cudaMemcpy( deva, &a, sizeof(int), cudaMemcpyHostToDevice );
cudaMemcpy( devc, &c, sizeof(int), cudaMemcpyHostToDevice );

cudaMemcpy( &c, deva, sizeof(int), cudaMemcpyDeviceToHost );
cudaMemcpy( &a, devc, sizeof(int), cudaMemcpyDeviceToHost );

printf("\n%d %d\n", a, c); // Output (should be "2 1")

cudaFree( deva );
cudaFree( devc );

return 0;

}

This simple code should swap a=1 and c=2, producing an output "2 1", but it does nothing and this isn't the only "simple" example based on a textbook that isn't working, for example why do all the textbooks say I can initialize a and c as pointers and later fill in their values, but the program won't compile if I do that? What am I overlooking here?

Aziz Shaikh
  • 16,245
  • 11
  • 62
  • 79
  • You have no kernel to run. – Mihai Maruseac Dec 10 '13 at 02:08
  • The code does work for me. – Sagar Masuti Dec 10 '13 at 02:11
  • You can do the error checking as mentioned in [here](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) Most likely you will find the problem yourself. – Sagar Masuti Dec 10 '13 at 02:22
  • Mihai Maruseac I don't need a kernel to simply swap values, do I? Sagar Masuti Does it work for you without you having changed anything? – user3085127 Dec 10 '13 at 02:30
  • It does work for me without any changes. [Check this](http://pastebin.com/5rY5rfRN) – Sagar Masuti Dec 10 '13 at 02:40
  • I found these threads: [link](http://stackoverflow.com/questions/15178285/wrong-results-in-cudamemcpydevicetohost?rq=1) and [link](http://stackoverflow.com/questions/15177019/cuda-program-does-not-give-the-correct-output-when-using-a-cuda-compatible-gpu) those might explain some things. I am using CUDA 5.0 while the remote computer I compile on has not been updated in 2 years (before CUDA 5.0). Can't believe I wasted 6 hours thinking I made some crucial mistake somewhere... – user3085127 Dec 10 '13 at 02:45
  • The code works also for me. Which architecture are you targeting? What is the compilation string? – Vitality Dec 10 '13 at 06:10
  • My .cu file is called 4.cu, I compile with "nvcc 4.cu -o 4", then run with "./4", nothing more. I don't know the type of the graphic card of the computer because I access it through a university server, it's not mine (all I know is it's at least 2 years old and probably a quadro, there is a possibility the drivers on the card are outdated and that my CUDA version is too recent but then again my script is as basic as it gets). – user3085127 Dec 10 '13 at 15:44
  • As already suggested, if you add [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) to your code, you'll have a much better idea of what is going wrong, even when you are running it remotely. – Robert Crovella Dec 10 '13 at 16:44

2 Answers2

1

Have a kernel thread with global declaration

__global__ void swap(int *a,int *c)
{
 //Swapping using xor operator
 *a=*a^*c;
 *c=*a^*c;
 *a=*a^*c;
}

call the Kernel thread from the main function as swap<<<N,1>>>(deva, devb, devc)

you can have further reference from http://on-demand.gputechconf.com/gtc-express/2011/presentations/GTC_Express_Sarah_Tariq_June2011.pdf

Arun
  • 105
  • 9
  • `__global__ void swap(void)` ? Did you mean `__global__ void swap(int *a, int *c)` ? – Sagar Masuti Dec 11 '13 at 01:29
  • @SagarMasuti i changed it...its actually void swap(int *a,int *c)...Thank u! – Arun Dec 11 '13 at 09:39
  • There are many problems with this code. The swapping method is a bad idea. The number of reads and writes to global memory is significantly higher than a kernel that makes use of a single temporary storage value, and so it will run more slowly. The kernel, as written, only knows how to swap a single value, and is missing many aspects of a normal cuda kernel. Furthermore, calling kernels with `<<>` parameters is a *bad* way to structure CUDA code. It appears in the linked presentation, but is for education only, as students are being introduced to the concepts of blocks and threads. – Robert Crovella Dec 12 '13 at 19:56
  • @RobertCrovella: Thank you for the comments!will try to solve the overhead on execution.And,in cuda we actually call the cuda function with <<>>(parameters) right?.How to structure it in good way? – Arun Dec 13 '13 at 02:04
1

Ok, I applied this to the parallel parts of my code and the result is a single error message: "no CUDA-capable device is detected". So I guess something in the network isn't working properly or I don't have all the permissions.

UPDATE: the network administrator changed some settings and everything works now.

Community
  • 1
  • 1