0

There are two arrays named A and B, they are corresponding to each other, and their space are allocated during the kernels running. the details of A and B are that A[i] is the position and B[i] is value.All the threads do the things below:

  1. If the current thread's data is in the arrays update B,
  2. Else expanding A and B, and insert the current thread's data into the arrays.
  3. The initial size of A and B are zero.

Is the upper implementing supported by CUDA?

talonmies
  • 70,661
  • 34
  • 192
  • 269
taoyuanjl
  • 145
  • 1
  • 14
  • Could you please clarify point #1? – Vitality Sep 13 '13 at 08:15
  • point #1 means that A[i] and B[i] store the position and value of the i-th element, current thread may update B[i], if the position of current thread's element is in array A. – taoyuanjl Sep 13 '13 at 08:46

1 Answers1

1

Concerning point #2, you would need something like C++'s realloc(), which, as long as I know, is not supported by CUDA. You can write your own realloc() according to this post

CUDA: Using realloc inside kernel

but I do not know how efficient will be this solution.

Alternatively, you should pre-allocate a "large" amount of global memory to be able to account for the worst case memory occupation scenario.

Community
  • 1
  • 1
Vitality
  • 20,705
  • 4
  • 108
  • 146
  • Thanks a lot! another question is how to guarantee atomic operation? if there are one more threads update A and B at the same time. – taoyuanjl Sep 13 '13 at 08:37
  • Have a look at the CUDA C Programming Guide, Section B.11. There you will find information on how using atomic operations in CUDA. – Vitality Sep 13 '13 at 11:05
  • the atomic operations in Section B.11 are used for exact number of global memory or shared memory, such as B[i]; I want to guarantee atomic operation for the whole array, such as the other thread is refused to access the array while one thread is accessing the array. – taoyuanjl Sep 13 '13 at 11:43
  • 2
    You might consider using a critical section to control access to the array but there are challenges and difficulties. Search on cuda critical section in the upper right corner – Robert Crovella Sep 13 '13 at 12:35
  • the threads in the same block should access the critical section so this will lead to deadlock. – taoyuanjl Sep 13 '13 at 13:30
  • 1
    Yes, I mentioned there would be challenges and difficulties. You could consider using a critical section to manage inter-block access, while using the ordinary threadblock communications methods (shared memory, `__syncthreads()`, etc.) to handle arbitration within a threadblock. – Robert Crovella Sep 15 '13 at 13:32