0

I get confuser reading about setting proper values for number of threads and blocks for cuda programming. After reading several guide and many tips i don't get answer that i search. My GPU: Nvidia GT645M Compute capability: 3.0

For what i know:
The maximum number of threads per block is in my case 1024 (32 x 32)
The maximum number of block in grid is in my case 2**31 - 1 = 4294967295 blocks
Multi processor count = 2
The number of block is depending on input data = (input data)/(number of threads per block)
For input data like [1,2,....10] + [1,2,....10] i need 10 threads and 1 block.

My computing problem:
For example i have input data like this:

n = 10
x = np.arange(n).astype(np.float32)
y = x + 1

I try make actions on this vector like: '+', '-', '*' by value

Question 1:
My knowledge:
The GPU cuda calculations working like this:
for each value in numpy array --> cuda block is used with one thread.

I mean
(x = [0,1,....9]) + (y = [1,2,....10]) = x[0] + y[0] in: block(0,0),thread(0,0),
then x[1] + y[1] in: block(0,0),thread(1,0) and so on.

Is that correct?

Question 2:
Let say: thread count = 5
block count = 1

then all threads in this one block will by running 2 times for x + y?

Question 3:
How many block can running simultaneously in one time?

If you can explain step by step calculation on cuda by simple vector example, that will by nice.

Thanks for all you help, please don't give any links to cuda guide i get confused reading them. Please give simple examples :)

  • And once you have read that marked duplicate, you will want to read https://stackoverflow.com/q/9985912/681865 – talonmies May 13 '18 at 02:38
  • Thank you, but that don't explain all my questions –  May 13 '18 at 07:35
  • If you read everything at both questions carefully and study the code in the linked duplicated, it will answer all your questions. – talonmies May 13 '18 at 11:02
  • There is no relationship between the size of any input to a kernel and the behaviour of the blocks which you run, You must explicitly choose the total number of threads and blocks which will run yourself and design the behaviour of the kernel code accordingly, And you do that by following the instructions at the two questions I linked to earlier – talonmies May 13 '18 at 15:01
  • Ok ,part of my questions is solved, i try to read full CUDA guide to find the rest answers. –  May 13 '18 at 18:01

0 Answers0