1

I know how i would copy a array from Host to GPU. But what happens if i have a column major matrix stored on host that i want to copy to a buffer on the GPU.

Are there other ways then copying one element at the time in a forloop ?

A_host [0 3 6 1 4 7 2 5 8].

GPUBuffer = [0 1 2 3 4 5 6 7 8].

Poul K. Sørensen
  • 16,950
  • 21
  • 126
  • 283

1 Answers1

3

In that case, and if the matrix is sufficiently large, you may want to send it "as-is" to the GPU, and insert an additional transpose kernel (or merge it to your first kernel).

Eric Bainville
  • 9,738
  • 1
  • 25
  • 27
  • Is it a normal process to create multiply kernels for the same task for cases like this to avoid situations where a transpose is needed or memory being copied? In a different setting i had 60.000 x 784 matrix where i needed to copy 100 rows to the GPU, i just copied 100 times in a forloop. should i consider to extract the memory on host and only query one copy command? – Poul K. Sørensen Mar 16 '13 at 08:31