2

In java we can declare a static variable. ie, if a variable is declared as static, its same through out that program. If I declare a variable inside CUDA kernel, each kernel will create that variable (multiple individual copies).

For example, if "int x=5" is initialized and if we launch two kernels. If we change the value of x to 6 (x=6) in one kernel. This change is not visible in the other kernel (value of x remains 5).

I want to declare a static variable in CUDA, every kernel should be able to access that variable value, if a change is made to that variable from one kernel, it should be visible in other kernel ( if x=6 in one kernel, other kernel should update to x=6).

I need this to find, whether a number exists in a matrix or not . For example a matrix and a number to find (say 5) is given.

2 3 0 0 0

1 4 5 0 0

7 8 0 0 0

0 0 0 0 0

0 0 0 0 0

I should get yes, row = 1 and col = 2 (assuming row and column starts at 0).

  • Maybe you should provide more details where you need such a behaviour because the general advice would be something like: don't do it or you use [atomic functions](http://docs.nvidia.com/cuda/cuda-c-programming-guide/#atomic-functions). The latter, if you don't understand where to use them, won't give you satisfactory performance... – havogt Feb 16 '16 at 15:10
  • atomic function allows you to increment a variable the way you want . But : your threads always execute in a warp ( a group of 32 threads, that will execute the same instruction at the same time i.e. all read the same value at the same time ). the problem is well exposed here : http://stackoverflow.com/a/21346015/4866974 – X3liF Feb 16 '16 at 15:29
  • @havogt, Thank you for helping, why I need this is, I want to search whether a number exists in a matrix, if that number exists a Boolean value should be set to true and false it doesn't –  Feb 16 '16 at 15:35
  • If all you need to know is that there is _at least_ one element, you could just write the `bool` to global memory. Maybe more than one thread will write, but at least one. – havogt Feb 16 '16 at 15:43
  • @havogt thank you , I'll search how to write to global memory –  Feb 16 '16 at 15:49
  • But let me add that this is not a very smart way. The smarter and more general way would be to use [parallel reduction](http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/reduction/doc/reduction.pdf). You might also want to have a look in the [thrust library](https://thrust.github.io/) which offers reduce functions and a `find()` function. – havogt Feb 16 '16 at 15:54
  • @havogt thanks again, when we find the number, can we also find the row and column of that number –  Feb 16 '16 at 16:01
  • In the lazy version you could write back an integer instead of a bool. However if there is more than one entry that matches your criterion, it is unspecified which location will be written... – havogt Feb 16 '16 at 16:05
  • @havogt in my case, there will be only unique numbers, all what I want is, to check whether a number exists or not. If exists, what are the corresponding row and column number –  Feb 16 '16 at 16:16
  • If you can guarantee that every entry is unique, then it is indeed very simple: Just write back the row and col to a variable in global memory. There is no race condition... – havogt Feb 16 '16 at 16:20
  • 1
    Maybe you should rephrase your question in a way that allows a reasonable answer... – havogt Feb 16 '16 at 16:22
  • @havogt but how to get the raw and col, is there any method –  Feb 16 '16 at 16:22
  • How should I know how you represent your matrix? Please provide a minimal example of your problem in the question. – havogt Feb 16 '16 at 16:25
  • @havogt I have edited the question and shown a small example of my problem. Please help me. –  Feb 16 '16 at 16:39
  • 1
    Help you to do what? I can help you to learn how to use write a good question for SO by pointing you to [mcve] and [How do I ask a good question?](http://stackoverflow.com/help/how-to-ask) – havogt Feb 16 '16 at 16:47

1 Answers1

3

For the use case you describe, a static variable that is accessible from device code is created using the __device__ qualifier. Refer to the documentation.

In addition, for the use case you describe (inter-thread/block/kernel communication) I would also mark that variable with the volatile qualifier. Refer to the documentation.

Something like this:

__device__ volatile int found = 0;

or

__device__ volatile bool found = false;

Here is an example from the programming guide that uses this construct for inter-thread communication.

You can then set that variable to 1 or true from any thread, and later query it for status.

Since your use-case description only involves setting the variable to a single value, regardless of which thread does it, there is no concern about simultaneous access from multiple threads, as long as the only operation you do is a write to that variable:

found = 1;

(and for this specific case, volatile may not be necessary either, depending on your exact usage.)

If you know that only one thread will find the item, and you also wish to record the x,y coordinates, that would be a trivial extension:

__device__ volatile int found = 0;
__device__ volatile int x = -1;
__device__ volatile int y = -1;

then your device code could be:

if (item_found){
  found = 1;
  x = item.x;
  y = item.y;}
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257