I need to linearly index my threads in such a way that I'm sure that the first 32 of them belong to the same warp, i.e., that the linear index follows how warps are internally created. In other words, are the linear index used to create warps c like or fortran like. To explain that, consider a block of threads of size 2x5. I can create a linear index that follows the fortran or the c convention:
0, 1, 2, 3, 4
5, 6, 7, 8, 9
vs.
0, 2, 4, 6, 8
1, 3, 5, 7, 9
For a large array, I want to be sure that my first 32 threads are all in the same warp. How is the correct way to generate the linear index?